LLVM 23.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-switch-precision",
304 cl::desc("Controls the number of cases considered by MSan for LLVM switch "
305 "instructions. 0 means no UUMs detected. Higher values lead to "
306 "fewer false negatives but may impact compiler and/or "
307 "application performance. N.B. LLVM switch instructions do not "
308 "correspond exactly to C++ switch statements."),
309 cl::Hidden, cl::init(99));
310
312 "msan-handle-lifetime-intrinsics",
313 cl::desc(
314 "when possible, poison scoped variables at the beginning of the scope "
315 "(slower, but more precise)"),
316 cl::Hidden, cl::init(true));
317
318// When compiling the Linux kernel, we sometimes see false positives related to
319// MSan being unable to understand that inline assembly calls may initialize
320// local variables.
321// This flag makes the compiler conservatively unpoison every memory location
322// passed into an assembly call. Note that this may cause false positives.
323// Because it's impossible to figure out the array sizes, we can only unpoison
324// the first sizeof(type) bytes for each type* pointer.
326 "msan-handle-asm-conservative",
327 cl::desc("conservative handling of inline assembly"), cl::Hidden,
328 cl::init(true));
329
330// This flag controls whether we check the shadow of the address
331// operand of load or store. Such bugs are very rare, since load from
332// a garbage address typically results in SEGV, but still happen
333// (e.g. only lower bits of address are garbage, or the access happens
334// early at program startup where malloc-ed memory is more likely to
335// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
337 "msan-check-access-address",
338 cl::desc("report accesses through a pointer which has poisoned shadow"),
339 cl::Hidden, cl::init(true));
340
342 "msan-eager-checks",
343 cl::desc("check arguments and return values at function call boundaries"),
344 cl::Hidden, cl::init(false));
345
347 "msan-dump-strict-instructions",
348 cl::desc("print out instructions with default strict semantics i.e.,"
349 "check that all the inputs are fully initialized, and mark "
350 "the output as fully initialized. These semantics are applied "
351 "to instructions that could not be handled explicitly nor "
352 "heuristically."),
353 cl::Hidden, cl::init(false));
354
355// Currently, all the heuristically handled instructions are specifically
356// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
357// to parallel 'msan-dump-strict-instructions', and to keep the door open to
358// handling non-intrinsic instructions heuristically.
360 "msan-dump-heuristic-instructions",
361 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
362 "Use -msan-dump-strict-instructions to print instructions that "
363 "could not be handled explicitly nor heuristically."),
364 cl::Hidden, cl::init(false));
365
367 "msan-instrumentation-with-call-threshold",
368 cl::desc(
369 "If the function being instrumented requires more than "
370 "this number of checks and origin stores, use callbacks instead of "
371 "inline checks (-1 means never use callbacks)."),
372 cl::Hidden, cl::init(3500));
373
374static cl::opt<bool>
375 ClEnableKmsan("msan-kernel",
376 cl::desc("Enable KernelMemorySanitizer instrumentation"),
377 cl::Hidden, cl::init(false));
378
379static cl::opt<bool>
380 ClDisableChecks("msan-disable-checks",
381 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
382 cl::init(false));
383
384static cl::opt<bool>
385 ClCheckConstantShadow("msan-check-constant-shadow",
386 cl::desc("Insert checks for constant shadow values"),
387 cl::Hidden, cl::init(true));
388
389// This is off by default because of a bug in gold:
390// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
391static cl::opt<bool>
392 ClWithComdat("msan-with-comdat",
393 cl::desc("Place MSan constructors in comdat sections"),
394 cl::Hidden, cl::init(false));
395
396// These options allow to specify custom memory map parameters
397// See MemoryMapParams for details.
398static cl::opt<uint64_t> ClAndMask("msan-and-mask",
399 cl::desc("Define custom MSan AndMask"),
400 cl::Hidden, cl::init(0));
401
402static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
403 cl::desc("Define custom MSan XorMask"),
404 cl::Hidden, cl::init(0));
405
406static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
407 cl::desc("Define custom MSan ShadowBase"),
408 cl::Hidden, cl::init(0));
409
410static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
411 cl::desc("Define custom MSan OriginBase"),
412 cl::Hidden, cl::init(0));
413
414static cl::opt<int>
415 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
416 cl::desc("Define threshold for number of checks per "
417 "debug location to force origin update."),
418 cl::Hidden, cl::init(3));
419
420const char kMsanModuleCtorName[] = "msan.module_ctor";
421const char kMsanInitName[] = "__msan_init";
422
423namespace {
424
425// Memory map parameters used in application-to-shadow address calculation.
426// Offset = (Addr & ~AndMask) ^ XorMask
427// Shadow = ShadowBase + Offset
428// Origin = OriginBase + Offset
429struct MemoryMapParams {
430 uint64_t AndMask;
431 uint64_t XorMask;
432 uint64_t ShadowBase;
433 uint64_t OriginBase;
434};
435
436struct PlatformMemoryMapParams {
437 const MemoryMapParams *bits32;
438 const MemoryMapParams *bits64;
439};
440
441} // end anonymous namespace
442
443// i386 Linux
444static const MemoryMapParams Linux_I386_MemoryMapParams = {
445 0x000080000000, // AndMask
446 0, // XorMask (not used)
447 0, // ShadowBase (not used)
448 0x000040000000, // OriginBase
449};
450
451// x86_64 Linux
452static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
453 0, // AndMask (not used)
454 0x500000000000, // XorMask
455 0, // ShadowBase (not used)
456 0x100000000000, // OriginBase
457};
458
459// mips32 Linux
460// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
461// after picking good constants
462
463// mips64 Linux
464static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
465 0, // AndMask (not used)
466 0x008000000000, // XorMask
467 0, // ShadowBase (not used)
468 0x002000000000, // OriginBase
469};
470
471// ppc32 Linux
472// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
473// after picking good constants
474
475// ppc64 Linux
476static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
477 0xE00000000000, // AndMask
478 0x100000000000, // XorMask
479 0x080000000000, // ShadowBase
480 0x1C0000000000, // OriginBase
481};
482
483// s390x Linux
484static const MemoryMapParams Linux_S390X_MemoryMapParams = {
485 0xC00000000000, // AndMask
486 0, // XorMask (not used)
487 0x080000000000, // ShadowBase
488 0x1C0000000000, // OriginBase
489};
490
491// arm32 Linux
492// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
493// after picking good constants
494
495// aarch64 Linux
496static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
497 0, // AndMask (not used)
498 0x0B00000000000, // XorMask
499 0, // ShadowBase (not used)
500 0x0200000000000, // OriginBase
501};
502
503// loongarch64 Linux
504static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
505 0, // AndMask (not used)
506 0x500000000000, // XorMask
507 0, // ShadowBase (not used)
508 0x100000000000, // OriginBase
509};
510
511// hexagon Linux
512static const MemoryMapParams Linux_Hexagon_MemoryMapParams = {
513 0, // AndMask (not used)
514 0x20000000, // XorMask
515 0, // ShadowBase (not used)
516 0x50000000, // OriginBase
517};
518
519// riscv32 Linux
520// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
521// after picking good constants
522
523// aarch64 FreeBSD
524static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
525 0x1800000000000, // AndMask
526 0x0400000000000, // XorMask
527 0x0200000000000, // ShadowBase
528 0x0700000000000, // OriginBase
529};
530
531// i386 FreeBSD
532static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
533 0x000180000000, // AndMask
534 0x000040000000, // XorMask
535 0x000020000000, // ShadowBase
536 0x000700000000, // OriginBase
537};
538
539// x86_64 FreeBSD
540static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
541 0xc00000000000, // AndMask
542 0x200000000000, // XorMask
543 0x100000000000, // ShadowBase
544 0x380000000000, // OriginBase
545};
546
547// x86_64 NetBSD
548static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
549 0, // AndMask
550 0x500000000000, // XorMask
551 0, // ShadowBase
552 0x100000000000, // OriginBase
553};
554
555static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
558};
559
560static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
561 nullptr,
563};
564
565static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
566 nullptr,
568};
569
570static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
571 nullptr,
573};
574
575static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
576 nullptr,
578};
579
580static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
581 nullptr,
583};
584
585static const PlatformMemoryMapParams Linux_Hexagon_MemoryMapParams_P = {
587 nullptr,
588};
589
590static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
591 nullptr,
593};
594
595static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
598};
599
600static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
601 nullptr,
603};
604
606
607namespace {
608
609/// Instrument functions of a module to detect uninitialized reads.
610///
611/// Instantiating MemorySanitizer inserts the msan runtime library API function
612/// declarations into the module if they don't exist already. Instantiating
613/// ensures the __msan_init function is in the list of global constructors for
614/// the module.
615class MemorySanitizer {
616public:
617 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
618 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
619 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
620 initializeModule(M);
621 }
622
623 // MSan cannot be moved or copied because of MapParams.
624 MemorySanitizer(MemorySanitizer &&) = delete;
625 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
626 MemorySanitizer(const MemorySanitizer &) = delete;
627 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
628
629 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
630
631private:
632 friend struct MemorySanitizerVisitor;
633 friend struct VarArgHelperBase;
634 friend struct VarArgAMD64Helper;
635 friend struct VarArgAArch64Helper;
636 friend struct VarArgPowerPC64Helper;
637 friend struct VarArgPowerPC32Helper;
638 friend struct VarArgSystemZHelper;
639 friend struct VarArgI386Helper;
640 friend struct VarArgGenericHelper;
641
642 void initializeModule(Module &M);
643 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
644 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
645 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
646
647 template <typename... ArgsTy>
648 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
649 ArgsTy... Args);
650
651 /// True if we're compiling the Linux kernel.
652 bool CompileKernel;
653 /// Track origins (allocation points) of uninitialized values.
654 int TrackOrigins;
655 bool Recover;
656 bool EagerChecks;
657
658 Triple TargetTriple;
659 LLVMContext *C;
660 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
661 Type *OriginTy;
662 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
663
664 // XxxTLS variables represent the per-thread state in MSan and per-task state
665 // in KMSAN.
666 // For the userspace these point to thread-local globals. In the kernel land
667 // they point to the members of a per-task struct obtained via a call to
668 // __msan_get_context_state().
669
670 /// Thread-local shadow storage for function parameters.
671 Value *ParamTLS;
672
673 /// Thread-local origin storage for function parameters.
674 Value *ParamOriginTLS;
675
676 /// Thread-local shadow storage for function return value.
677 Value *RetvalTLS;
678
679 /// Thread-local origin storage for function return value.
680 Value *RetvalOriginTLS;
681
682 /// Thread-local shadow storage for in-register va_arg function.
683 Value *VAArgTLS;
684
685 /// Thread-local shadow storage for in-register va_arg function.
686 Value *VAArgOriginTLS;
687
688 /// Thread-local shadow storage for va_arg overflow area.
689 Value *VAArgOverflowSizeTLS;
690
691 /// Are the instrumentation callbacks set up?
692 bool CallbacksInitialized = false;
693
694 /// The run-time callback to print a warning.
695 FunctionCallee WarningFn;
696
697 // These arrays are indexed by log2(AccessSize).
698 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
699 FunctionCallee MaybeWarningVarSizeFn;
700 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
701
702 /// Run-time helper that generates a new origin value for a stack
703 /// allocation.
704 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
705 // No description version
706 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
707
708 /// Run-time helper that poisons stack on function entry.
709 FunctionCallee MsanPoisonStackFn;
710
711 /// Run-time helper that records a store (or any event) of an
712 /// uninitialized value and returns an updated origin id encoding this info.
713 FunctionCallee MsanChainOriginFn;
714
715 /// Run-time helper that paints an origin over a region.
716 FunctionCallee MsanSetOriginFn;
717
718 /// MSan runtime replacements for memmove, memcpy and memset.
719 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
720
721 /// KMSAN callback for task-local function argument shadow.
722 StructType *MsanContextStateTy;
723 FunctionCallee MsanGetContextStateFn;
724
725 /// Functions for poisoning/unpoisoning local variables
726 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
727
728 /// Pair of shadow/origin pointers.
729 Type *MsanMetadata;
730
731 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
732 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
733 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
734 FunctionCallee MsanMetadataPtrForStore_1_8[4];
735 FunctionCallee MsanInstrumentAsmStoreFn;
736
737 /// Storage for return values of the MsanMetadataPtrXxx functions.
738 Value *MsanMetadataAlloca;
739
740 /// Helper to choose between different MsanMetadataPtrXxx().
741 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
742
743 /// Memory map parameters used in application-to-shadow calculation.
744 const MemoryMapParams *MapParams;
745
746 /// Custom memory map parameters used when -msan-shadow-base or
747 // -msan-origin-base is provided.
748 MemoryMapParams CustomMapParams;
749
750 MDNode *ColdCallWeights;
751
752 /// Branch weights for origin store.
753 MDNode *OriginStoreWeights;
754};
755
756void insertModuleCtor(Module &M) {
759 /*InitArgTypes=*/{},
760 /*InitArgs=*/{},
761 // This callback is invoked when the functions are created the first
762 // time. Hook them into the global ctors list in that case:
763 [&](Function *Ctor, FunctionCallee) {
764 if (!ClWithComdat) {
765 appendToGlobalCtors(M, Ctor, 0);
766 return;
767 }
768 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
769 Ctor->setComdat(MsanCtorComdat);
770 appendToGlobalCtors(M, Ctor, 0, Ctor);
771 });
772}
773
774template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
775 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
776}
777
778} // end anonymous namespace
779
781 bool EagerChecks)
782 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
783 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
784 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
785 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
786
789 // Return early if nosanitize_memory module flag is present for the module.
790 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
791 return PreservedAnalyses::all();
792 bool Modified = false;
793 if (!Options.Kernel) {
794 insertModuleCtor(M);
795 Modified = true;
796 }
797
798 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
799 for (Function &F : M) {
800 if (F.empty())
801 continue;
802 MemorySanitizer Msan(*F.getParent(), Options);
803 Modified |=
804 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
805 }
806
807 if (!Modified)
808 return PreservedAnalyses::all();
809
811 // GlobalsAA is considered stateless and does not get invalidated unless
812 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
813 // make changes that require GlobalsAA to be invalidated.
814 PA.abandon<GlobalsAA>();
815 return PA;
816}
817
819 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
821 OS, MapClassName2PassName);
822 OS << '<';
823 if (Options.Recover)
824 OS << "recover;";
825 if (Options.Kernel)
826 OS << "kernel;";
827 if (Options.EagerChecks)
828 OS << "eager-checks;";
829 OS << "track-origins=" << Options.TrackOrigins;
830 OS << '>';
831}
832
833/// Create a non-const global initialized with the given string.
834///
835/// Creates a writable global for Str so that we can pass it to the
836/// run-time lib. Runtime uses first 4 bytes of the string to store the
837/// frame ID, so the string needs to be mutable.
839 StringRef Str) {
840 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
841 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
842 GlobalValue::PrivateLinkage, StrConst, "");
843}
844
845template <typename... ArgsTy>
847MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
848 ArgsTy... Args) {
849 if (TargetTriple.getArch() == Triple::systemz) {
850 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
851 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
852 std::forward<ArgsTy>(Args)...);
853 }
854
855 return M.getOrInsertFunction(Name, MsanMetadata,
856 std::forward<ArgsTy>(Args)...);
857}
858
859/// Create KMSAN API callbacks.
860void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
861 IRBuilder<> IRB(*C);
862
863 // These will be initialized in insertKmsanPrologue().
864 RetvalTLS = nullptr;
865 RetvalOriginTLS = nullptr;
866 ParamTLS = nullptr;
867 ParamOriginTLS = nullptr;
868 VAArgTLS = nullptr;
869 VAArgOriginTLS = nullptr;
870 VAArgOverflowSizeTLS = nullptr;
871
872 WarningFn = M.getOrInsertFunction("__msan_warning",
873 TLI.getAttrList(C, {0}, /*Signed=*/false),
874 IRB.getVoidTy(), IRB.getInt32Ty());
875
876 // Requests the per-task context state (kmsan_context_state*) from the
877 // runtime library.
878 MsanContextStateTy = StructType::get(
879 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
880 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
881 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
882 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
883 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
884 OriginTy);
885 MsanGetContextStateFn =
886 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
887
888 MsanMetadata = StructType::get(PtrTy, PtrTy);
889
890 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
891 std::string name_load =
892 "__msan_metadata_ptr_for_load_" + std::to_string(size);
893 std::string name_store =
894 "__msan_metadata_ptr_for_store_" + std::to_string(size);
895 MsanMetadataPtrForLoad_1_8[ind] =
896 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
897 MsanMetadataPtrForStore_1_8[ind] =
898 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
899 }
900
901 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
902 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
903 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
904 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
905
906 // Functions for poisoning and unpoisoning memory.
907 MsanPoisonAllocaFn = M.getOrInsertFunction(
908 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
909 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
910 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
911}
912
914 return M.getOrInsertGlobal(Name, Ty, [&] {
915 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
916 nullptr, Name, nullptr,
918 });
919}
920
921/// Insert declarations for userspace-specific functions and globals.
922void MemorySanitizer::createUserspaceApi(Module &M,
923 const TargetLibraryInfo &TLI) {
924 IRBuilder<> IRB(*C);
925
926 // Create the callback.
927 // FIXME: this function should have "Cold" calling conv,
928 // which is not yet implemented.
929 if (TrackOrigins) {
930 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
931 : "__msan_warning_with_origin_noreturn";
932 WarningFn = M.getOrInsertFunction(WarningFnName,
933 TLI.getAttrList(C, {0}, /*Signed=*/false),
934 IRB.getVoidTy(), IRB.getInt32Ty());
935 } else {
936 StringRef WarningFnName =
937 Recover ? "__msan_warning" : "__msan_warning_noreturn";
938 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
939 }
940
941 // Create the global TLS variables.
942 RetvalTLS =
943 getOrInsertGlobal(M, "__msan_retval_tls",
944 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
945
946 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
947
948 ParamTLS =
949 getOrInsertGlobal(M, "__msan_param_tls",
950 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
951
952 ParamOriginTLS =
953 getOrInsertGlobal(M, "__msan_param_origin_tls",
954 ArrayType::get(OriginTy, kParamTLSSize / 4));
955
956 VAArgTLS =
957 getOrInsertGlobal(M, "__msan_va_arg_tls",
958 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
959
960 VAArgOriginTLS =
961 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
962 ArrayType::get(OriginTy, kParamTLSSize / 4));
963
964 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
965 IRB.getIntPtrTy(M.getDataLayout()));
966
967 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
968 AccessSizeIndex++) {
969 unsigned AccessSize = 1 << AccessSizeIndex;
970 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
971 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
972 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
973 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
974 MaybeWarningVarSizeFn = M.getOrInsertFunction(
975 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
976 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
977 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
978 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
979 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
980 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
981 IRB.getInt32Ty());
982 }
983
984 MsanSetAllocaOriginWithDescriptionFn =
985 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
986 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
987 MsanSetAllocaOriginNoDescriptionFn =
988 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
989 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
990 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
991 IRB.getVoidTy(), PtrTy, IntptrTy);
992}
993
994/// Insert extern declaration of runtime-provided functions and globals.
995void MemorySanitizer::initializeCallbacks(Module &M,
996 const TargetLibraryInfo &TLI) {
997 // Only do this once.
998 if (CallbacksInitialized)
999 return;
1000
1001 IRBuilder<> IRB(*C);
1002 // Initialize callbacks that are common for kernel and userspace
1003 // instrumentation.
1004 MsanChainOriginFn = M.getOrInsertFunction(
1005 "__msan_chain_origin",
1006 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
1007 IRB.getInt32Ty());
1008 MsanSetOriginFn = M.getOrInsertFunction(
1009 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
1010 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
1011 MemmoveFn =
1012 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
1013 MemcpyFn =
1014 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
1015 MemsetFn = M.getOrInsertFunction("__msan_memset",
1016 TLI.getAttrList(C, {1}, /*Signed=*/true),
1017 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
1018
1019 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
1020 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
1021
1022 if (CompileKernel) {
1023 createKernelApi(M, TLI);
1024 } else {
1025 createUserspaceApi(M, TLI);
1026 }
1027 CallbacksInitialized = true;
1028}
1029
1030FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1031 int size) {
1032 FunctionCallee *Fns =
1033 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1034 switch (size) {
1035 case 1:
1036 return Fns[0];
1037 case 2:
1038 return Fns[1];
1039 case 4:
1040 return Fns[2];
1041 case 8:
1042 return Fns[3];
1043 default:
1044 return nullptr;
1045 }
1046}
1047
1048/// Module-level initialization.
1049///
1050/// inserts a call to __msan_init to the module's constructor list.
1051void MemorySanitizer::initializeModule(Module &M) {
1052 auto &DL = M.getDataLayout();
1053
1054 TargetTriple = M.getTargetTriple();
1055
1056 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1057 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1058 // Check the overrides first
1059 if (ShadowPassed || OriginPassed) {
1060 CustomMapParams.AndMask = ClAndMask;
1061 CustomMapParams.XorMask = ClXorMask;
1062 CustomMapParams.ShadowBase = ClShadowBase;
1063 CustomMapParams.OriginBase = ClOriginBase;
1064 MapParams = &CustomMapParams;
1065 } else {
1066 switch (TargetTriple.getOS()) {
1067 case Triple::FreeBSD:
1068 switch (TargetTriple.getArch()) {
1069 case Triple::aarch64:
1070 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1071 break;
1072 case Triple::x86_64:
1073 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1074 break;
1075 case Triple::x86:
1076 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1077 break;
1078 default:
1079 report_fatal_error("unsupported architecture");
1080 }
1081 break;
1082 case Triple::NetBSD:
1083 switch (TargetTriple.getArch()) {
1084 case Triple::x86_64:
1085 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1086 break;
1087 default:
1088 report_fatal_error("unsupported architecture");
1089 }
1090 break;
1091 case Triple::Linux:
1092 switch (TargetTriple.getArch()) {
1093 case Triple::x86_64:
1094 MapParams = Linux_X86_MemoryMapParams.bits64;
1095 break;
1096 case Triple::x86:
1097 MapParams = Linux_X86_MemoryMapParams.bits32;
1098 break;
1099 case Triple::mips64:
1100 case Triple::mips64el:
1101 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1102 break;
1103 case Triple::ppc64:
1104 case Triple::ppc64le:
1105 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1106 break;
1107 case Triple::systemz:
1108 MapParams = Linux_S390_MemoryMapParams.bits64;
1109 break;
1110 case Triple::aarch64:
1111 case Triple::aarch64_be:
1112 MapParams = Linux_ARM_MemoryMapParams.bits64;
1113 break;
1115 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1116 break;
1117 case Triple::hexagon:
1118 MapParams = Linux_Hexagon_MemoryMapParams_P.bits32;
1119 break;
1120 default:
1121 report_fatal_error("unsupported architecture");
1122 }
1123 break;
1124 default:
1125 report_fatal_error("unsupported operating system");
1126 }
1127 }
1128
1129 C = &(M.getContext());
1130 IRBuilder<> IRB(*C);
1131 IntptrTy = IRB.getIntPtrTy(DL);
1132 OriginTy = IRB.getInt32Ty();
1133 PtrTy = IRB.getPtrTy();
1134
1135 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1136 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1137
1138 if (!CompileKernel) {
1139 if (TrackOrigins)
1140 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1141 return new GlobalVariable(
1142 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1143 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1144 });
1145
1146 if (Recover)
1147 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1148 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1149 GlobalValue::WeakODRLinkage,
1150 IRB.getInt32(Recover), "__msan_keep_going");
1151 });
1152 }
1153}
1154
1155namespace {
1156
1157/// A helper class that handles instrumentation of VarArg
1158/// functions on a particular platform.
1159///
1160/// Implementations are expected to insert the instrumentation
1161/// necessary to propagate argument shadow through VarArg function
1162/// calls. Visit* methods are called during an InstVisitor pass over
1163/// the function, and should avoid creating new basic blocks. A new
1164/// instance of this class is created for each instrumented function.
1165struct VarArgHelper {
1166 virtual ~VarArgHelper() = default;
1167
1168 /// Visit a CallBase.
1169 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1170
1171 /// Visit a va_start call.
1172 virtual void visitVAStartInst(VAStartInst &I) = 0;
1173
1174 /// Visit a va_copy call.
1175 virtual void visitVACopyInst(VACopyInst &I) = 0;
1176
1177 /// Finalize function instrumentation.
1178 ///
1179 /// This method is called after visiting all interesting (see above)
1180 /// instructions in a function.
1181 virtual void finalizeInstrumentation() = 0;
1182};
1183
1184struct MemorySanitizerVisitor;
1185
1186} // end anonymous namespace
1187
1188static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1189 MemorySanitizerVisitor &Visitor);
1190
1191static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1192 if (TS.isScalable())
1193 // Scalable types unconditionally take slowpaths.
1194 return kNumberOfAccessSizes;
1195 unsigned TypeSizeFixed = TS.getFixedValue();
1196 if (TypeSizeFixed <= 8)
1197 return 0;
1198 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1199}
1200
1201namespace {
1202
1203/// Helper class to attach debug information of the given instruction onto new
1204/// instructions inserted after.
1205class NextNodeIRBuilder : public IRBuilder<> {
1206public:
1207 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1208 SetCurrentDebugLocation(IP->getDebugLoc());
1209 }
1210};
1211
1212/// This class does all the work for a given function. Store and Load
1213/// instructions store and load corresponding shadow and origin
1214/// values. Most instructions propagate shadow from arguments to their
1215/// return values. Certain instructions (most importantly, BranchInst)
1216/// test their argument shadow and print reports (with a runtime call) if it's
1217/// non-zero.
1218struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1219 Function &F;
1220 MemorySanitizer &MS;
1221 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1222 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1223 std::unique_ptr<VarArgHelper> VAHelper;
1224 const TargetLibraryInfo *TLI;
1225 Instruction *FnPrologueEnd;
1226 SmallVector<Instruction *, 16> Instructions;
1227
1228 // The following flags disable parts of MSan instrumentation based on
1229 // exclusion list contents and command-line options.
1230 bool InsertChecks;
1231 bool PropagateShadow;
1232 bool PoisonStack;
1233 bool PoisonUndef;
1234 bool PoisonUndefVectors;
1235
1236 struct ShadowOriginAndInsertPoint {
1237 Value *Shadow;
1238 Value *Origin;
1239 Instruction *OrigIns;
1240
1241 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1242 : Shadow(S), Origin(O), OrigIns(I) {}
1243 };
1245 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1246 SmallSetVector<AllocaInst *, 16> AllocaSet;
1249 int64_t SplittableBlocksCount = 0;
1250
1251 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1252 const TargetLibraryInfo &TLI)
1253 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1254 bool SanitizeFunction =
1255 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1256 InsertChecks = SanitizeFunction;
1257 PropagateShadow = SanitizeFunction;
1258 PoisonStack = SanitizeFunction && ClPoisonStack;
1259 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1260 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1261
1262 // In the presence of unreachable blocks, we may see Phi nodes with
1263 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1264 // blocks, such nodes will not have any shadow value associated with them.
1265 // It's easier to remove unreachable blocks than deal with missing shadow.
1267
1268 MS.initializeCallbacks(*F.getParent(), TLI);
1269 FnPrologueEnd =
1270 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1271 .CreateIntrinsic(Intrinsic::donothing, {});
1272
1273 if (MS.CompileKernel) {
1274 IRBuilder<> IRB(FnPrologueEnd);
1275 insertKmsanPrologue(IRB);
1276 }
1277
1278 LLVM_DEBUG(if (!InsertChecks) dbgs()
1279 << "MemorySanitizer is not inserting checks into '"
1280 << F.getName() << "'\n");
1281 }
1282
1283 bool instrumentWithCalls(Value *V) {
1284 // Constants likely will be eliminated by follow-up passes.
1285 if (isa<Constant>(V))
1286 return false;
1287 ++SplittableBlocksCount;
1289 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1290 }
1291
1292 bool isInPrologue(Instruction &I) {
1293 return I.getParent() == FnPrologueEnd->getParent() &&
1294 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1295 }
1296
1297 // Creates a new origin and records the stack trace. In general we can call
1298 // this function for any origin manipulation we like. However it will cost
1299 // runtime resources. So use this wisely only if it can provide additional
1300 // information helpful to a user.
1301 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1302 if (MS.TrackOrigins <= 1)
1303 return V;
1304 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1305 }
1306
1307 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1308 const DataLayout &DL = F.getDataLayout();
1309 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1310 if (IntptrSize == kOriginSize)
1311 return Origin;
1312 assert(IntptrSize == kOriginSize * 2);
1313 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1314 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1315 }
1316
1317 /// Fill memory range with the given origin value.
1318 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1319 TypeSize TS, Align Alignment) {
1320 const DataLayout &DL = F.getDataLayout();
1321 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1322 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1323 assert(IntptrAlignment >= kMinOriginAlignment);
1324 assert(IntptrSize >= kOriginSize);
1325
1326 // Note: The loop based formation works for fixed length vectors too,
1327 // however we prefer to unroll and specialize alignment below.
1328 if (TS.isScalable()) {
1329 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1330 Value *RoundUp =
1331 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1332 Value *End =
1333 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1334 auto [InsertPt, Index] =
1336 IRB.SetInsertPoint(InsertPt);
1337
1338 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1340 return;
1341 }
1342
1343 unsigned Size = TS.getFixedValue();
1344
1345 unsigned Ofs = 0;
1346 Align CurrentAlignment = Alignment;
1347 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1348 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1349 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1350 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1351 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1352 : IntptrOriginPtr;
1353 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1354 Ofs += IntptrSize / kOriginSize;
1355 CurrentAlignment = IntptrAlignment;
1356 }
1357 }
1358
1359 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1360 Value *GEP =
1361 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1362 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1363 CurrentAlignment = kMinOriginAlignment;
1364 }
1365 }
1366
1367 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1368 Value *OriginPtr, Align Alignment) {
1369 const DataLayout &DL = F.getDataLayout();
1370 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1371 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1372 // ZExt cannot convert between vector and scalar
1373 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1374 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1375 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1376 // Origin is not needed: value is initialized or const shadow is
1377 // ignored.
1378 return;
1379 }
1380 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1381 // Copy origin as the value is definitely uninitialized.
1382 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1383 OriginAlignment);
1384 return;
1385 }
1386 // Fallback to runtime check, which still can be optimized out later.
1387 }
1388
1389 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1390 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1391 if (instrumentWithCalls(ConvertedShadow) &&
1392 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1393 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1394 Value *ConvertedShadow2 =
1395 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1396 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1397 CB->addParamAttr(0, Attribute::ZExt);
1398 CB->addParamAttr(2, Attribute::ZExt);
1399 } else {
1400 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1402 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1403 IRBuilder<> IRBNew(CheckTerm);
1404 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1405 OriginAlignment);
1406 }
1407 }
1408
1409 void materializeStores() {
1410 for (StoreInst *SI : StoreList) {
1411 IRBuilder<> IRB(SI);
1412 Value *Val = SI->getValueOperand();
1413 Value *Addr = SI->getPointerOperand();
1414 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1415 Value *ShadowPtr, *OriginPtr;
1416 Type *ShadowTy = Shadow->getType();
1417 const Align Alignment = SI->getAlign();
1418 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1419 std::tie(ShadowPtr, OriginPtr) =
1420 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1421
1422 [[maybe_unused]] StoreInst *NewSI =
1423 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1424 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1425
1426 if (SI->isAtomic())
1427 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1428
1429 if (MS.TrackOrigins && !SI->isAtomic())
1430 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1431 OriginAlignment);
1432 }
1433 }
1434
1435 // Returns true if Debug Location corresponds to multiple warnings.
1436 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1437 if (MS.TrackOrigins < 2)
1438 return false;
1439
1440 if (LazyWarningDebugLocationCount.empty())
1441 for (const auto &I : InstrumentationList)
1442 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1443
1444 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1445 }
1446
1447 /// Helper function to insert a warning at IRB's current insert point.
1448 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1449 if (!Origin)
1450 Origin = (Value *)IRB.getInt32(0);
1451 assert(Origin->getType()->isIntegerTy());
1452
1453 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1454 // Try to create additional origin with debug info of the last origin
1455 // instruction. It may provide additional information to the user.
1456 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1457 assert(MS.TrackOrigins);
1458 auto NewDebugLoc = OI->getDebugLoc();
1459 // Origin update with missing or the same debug location provides no
1460 // additional value.
1461 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1462 // Insert update just before the check, so we call runtime only just
1463 // before the report.
1464 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1465 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1466 Origin = updateOrigin(Origin, IRBOrigin);
1467 }
1468 }
1469 }
1470
1471 if (MS.CompileKernel || MS.TrackOrigins)
1472 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1473 else
1474 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1475 // FIXME: Insert UnreachableInst if !MS.Recover?
1476 // This may invalidate some of the following checks and needs to be done
1477 // at the very end.
1478 }
1479
1480 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1481 Value *Origin) {
1482 const DataLayout &DL = F.getDataLayout();
1483 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1484 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1485 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1486 // ZExt cannot convert between vector and scalar
1487 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1488 Value *ConvertedShadow2 =
1489 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1490
1491 if (SizeIndex < kNumberOfAccessSizes) {
1492 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1493 CallBase *CB = IRB.CreateCall(
1494 Fn,
1495 {ConvertedShadow2,
1496 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1497 CB->addParamAttr(0, Attribute::ZExt);
1498 CB->addParamAttr(1, Attribute::ZExt);
1499 } else {
1500 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1501 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1502 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1503 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1504 CallBase *CB = IRB.CreateCall(
1505 Fn,
1506 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1507 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1508 CB->addParamAttr(1, Attribute::ZExt);
1509 CB->addParamAttr(2, Attribute::ZExt);
1510 }
1511 } else {
1512 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1514 Cmp, &*IRB.GetInsertPoint(),
1515 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1516
1517 IRB.SetInsertPoint(CheckTerm);
1518 insertWarningFn(IRB, Origin);
1519 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1520 }
1521 }
1522
1523 void materializeInstructionChecks(
1524 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1525 const DataLayout &DL = F.getDataLayout();
1526 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1527 // correct origin.
1528 bool Combine = !MS.TrackOrigins;
1529 Instruction *Instruction = InstructionChecks.front().OrigIns;
1530 Value *Shadow = nullptr;
1531 for (const auto &ShadowData : InstructionChecks) {
1532 assert(ShadowData.OrigIns == Instruction);
1533 IRBuilder<> IRB(Instruction);
1534
1535 Value *ConvertedShadow = ShadowData.Shadow;
1536
1537 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1538 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1539 // Skip, value is initialized or const shadow is ignored.
1540 continue;
1541 }
1542 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1543 // Report as the value is definitely uninitialized.
1544 insertWarningFn(IRB, ShadowData.Origin);
1545 if (!MS.Recover)
1546 return; // Always fail and stop here, not need to check the rest.
1547 // Skip entire instruction,
1548 continue;
1549 }
1550 // Fallback to runtime check, which still can be optimized out later.
1551 }
1552
1553 if (!Combine) {
1554 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1555 continue;
1556 }
1557
1558 if (!Shadow) {
1559 Shadow = ConvertedShadow;
1560 continue;
1561 }
1562
1563 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1564 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1565 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1566 }
1567
1568 if (Shadow) {
1569 assert(Combine);
1570 IRBuilder<> IRB(Instruction);
1571 materializeOneCheck(IRB, Shadow, nullptr);
1572 }
1573 }
1574
1575 static bool isAArch64SVCount(Type *Ty) {
1576 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1577 return TTy->getName() == "aarch64.svcount";
1578 return false;
1579 }
1580
1581 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1582 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1583 static bool isScalableNonVectorType(Type *Ty) {
1584 if (!isAArch64SVCount(Ty))
1585 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1586 << "\n");
1587
1588 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1589 }
1590
1591 void materializeChecks() {
1592#ifndef NDEBUG
1593 // For assert below.
1594 SmallPtrSet<Instruction *, 16> Done;
1595#endif
1596
1597 for (auto I = InstrumentationList.begin();
1598 I != InstrumentationList.end();) {
1599 auto OrigIns = I->OrigIns;
1600 // Checks are grouped by the original instruction. We call all
1601 // `insertShadowCheck` for an instruction at once.
1602 assert(Done.insert(OrigIns).second);
1603 auto J = std::find_if(I + 1, InstrumentationList.end(),
1604 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1605 return OrigIns != R.OrigIns;
1606 });
1607 // Process all checks of instruction at once.
1608 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1609 I = J;
1610 }
1611
1612 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1613 }
1614
1615 // Returns the last instruction in the new prologue
1616 void insertKmsanPrologue(IRBuilder<> &IRB) {
1617 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1618 Constant *Zero = IRB.getInt32(0);
1619 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1620 {Zero, IRB.getInt32(0)}, "param_shadow");
1621 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1622 {Zero, IRB.getInt32(1)}, "retval_shadow");
1623 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1624 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1625 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1626 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1627 MS.VAArgOverflowSizeTLS =
1628 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1629 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1630 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1631 {Zero, IRB.getInt32(5)}, "param_origin");
1632 MS.RetvalOriginTLS =
1633 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1634 {Zero, IRB.getInt32(6)}, "retval_origin");
1635 if (MS.TargetTriple.getArch() == Triple::systemz)
1636 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1637 }
1638
1639 /// Add MemorySanitizer instrumentation to a function.
1640 bool runOnFunction() {
1641 // Iterate all BBs in depth-first order and create shadow instructions
1642 // for all instructions (where applicable).
1643 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1644 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1645 visit(*BB);
1646
1647 // `visit` above only collects instructions. Process them after iterating
1648 // CFG to avoid requirement on CFG transformations.
1649 for (Instruction *I : Instructions)
1651
1652 // Finalize PHI nodes.
1653 for (PHINode *PN : ShadowPHINodes) {
1654 PHINode *PNS = cast<PHINode>(getShadow(PN));
1655 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1656 size_t NumValues = PN->getNumIncomingValues();
1657 for (size_t v = 0; v < NumValues; v++) {
1658 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1659 if (PNO)
1660 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1661 }
1662 }
1663
1664 VAHelper->finalizeInstrumentation();
1665
1666 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1667 // instrumenting only allocas.
1669 for (auto Item : LifetimeStartList) {
1670 instrumentAlloca(*Item.second, Item.first);
1671 AllocaSet.remove(Item.second);
1672 }
1673 }
1674 // Poison the allocas for which we didn't instrument the corresponding
1675 // lifetime intrinsics.
1676 for (AllocaInst *AI : AllocaSet)
1677 instrumentAlloca(*AI);
1678
1679 // Insert shadow value checks.
1680 materializeChecks();
1681
1682 // Delayed instrumentation of StoreInst.
1683 // This may not add new address checks.
1684 materializeStores();
1685
1686 return true;
1687 }
1688
1689 /// Compute the shadow type that corresponds to a given Value.
1690 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1691
1692 /// Compute the shadow type that corresponds to a given Type.
1693 Type *getShadowTy(Type *OrigTy) {
1694 if (!OrigTy->isSized()) {
1695 return nullptr;
1696 }
1697 // For integer type, shadow is the same as the original type.
1698 // This may return weird-sized types like i1.
1699 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1700 return IT;
1701 const DataLayout &DL = F.getDataLayout();
1702 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1703 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1704 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1705 VT->getElementCount());
1706 }
1707 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1708 return ArrayType::get(getShadowTy(AT->getElementType()),
1709 AT->getNumElements());
1710 }
1711 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1713 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1714 Elements.push_back(getShadowTy(ST->getElementType(i)));
1715 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1716 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1717 return Res;
1718 }
1719 if (isScalableNonVectorType(OrigTy)) {
1720 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1721 << "\n");
1722 return OrigTy;
1723 }
1724
1725 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1726 return IntegerType::get(*MS.C, TypeSize);
1727 }
1728
1729 /// Extract combined shadow of struct elements as a bool
1730 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1731 IRBuilder<> &IRB) {
1732 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1733 Value *Aggregator = FalseVal;
1734
1735 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1736 // Combine by ORing together each element's bool shadow
1737 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1738 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1739
1740 if (Aggregator != FalseVal)
1741 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1742 else
1743 Aggregator = ShadowBool;
1744 }
1745
1746 return Aggregator;
1747 }
1748
1749 // Extract combined shadow of array elements
1750 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1751 IRBuilder<> &IRB) {
1752 if (!Array->getNumElements())
1753 return IRB.getIntN(/* width */ 1, /* value */ 0);
1754
1755 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1756 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1757
1758 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1759 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1760 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1761 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1762 }
1763 return Aggregator;
1764 }
1765
1766 /// Convert a shadow value to it's flattened variant. The resulting
1767 /// shadow may not necessarily have the same bit width as the input
1768 /// value, but it will always be comparable to zero.
1769 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1770 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1771 return collapseStructShadow(Struct, V, IRB);
1772 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1773 return collapseArrayShadow(Array, V, IRB);
1774 if (isa<VectorType>(V->getType())) {
1775 if (isa<ScalableVectorType>(V->getType()))
1776 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1777 unsigned BitWidth =
1778 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1779 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1780 }
1781 return V;
1782 }
1783
1784 // Convert a scalar value to an i1 by comparing with 0
1785 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1786 Type *VTy = V->getType();
1787 if (!VTy->isIntegerTy())
1788 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1789 if (VTy->getIntegerBitWidth() == 1)
1790 // Just converting a bool to a bool, so do nothing.
1791 return V;
1792 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1793 }
1794
1795 Type *ptrToIntPtrType(Type *PtrTy) const {
1796 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1797 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1798 VectTy->getElementCount());
1799 }
1800 assert(PtrTy->isIntOrPtrTy());
1801 return MS.IntptrTy;
1802 }
1803
1804 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1805 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1806 return VectorType::get(
1807 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1808 VectTy->getElementCount());
1809 }
1810 assert(IntPtrTy == MS.IntptrTy);
1811 return MS.PtrTy;
1812 }
1813
1814 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1815 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1817 VectTy->getElementCount(),
1818 constToIntPtr(VectTy->getElementType(), C));
1819 }
1820 assert(IntPtrTy == MS.IntptrTy);
1821 // TODO: Avoid implicit trunc?
1822 // See https://github.com/llvm/llvm-project/issues/112510.
1823 return ConstantInt::get(MS.IntptrTy, C, /*IsSigned=*/false,
1824 /*ImplicitTrunc=*/true);
1825 }
1826
1827 /// Returns the integer shadow offset that corresponds to a given
1828 /// application address, whereby:
1829 ///
1830 /// Offset = (Addr & ~AndMask) ^ XorMask
1831 /// Shadow = ShadowBase + Offset
1832 /// Origin = (OriginBase + Offset) & ~Alignment
1833 ///
1834 /// Note: for efficiency, many shadow mappings only require use the XorMask
1835 /// and OriginBase; the AndMask and ShadowBase are often zero.
1836 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1837 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1838 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1839
1840 if (uint64_t AndMask = MS.MapParams->AndMask)
1841 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1842
1843 if (uint64_t XorMask = MS.MapParams->XorMask)
1844 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1845 return OffsetLong;
1846 }
1847
1848 /// Compute the shadow and origin addresses corresponding to a given
1849 /// application address.
1850 ///
1851 /// Shadow = ShadowBase + Offset
1852 /// Origin = (OriginBase + Offset) & ~3ULL
1853 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1854 /// a single pointee.
1855 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1856 std::pair<Value *, Value *>
1857 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1858 MaybeAlign Alignment) {
1859 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1860 if (!VectTy) {
1861 assert(Addr->getType()->isPointerTy());
1862 } else {
1863 assert(VectTy->getElementType()->isPointerTy());
1864 }
1865 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1866 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1867 Value *ShadowLong = ShadowOffset;
1868 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1869 ShadowLong =
1870 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1871 }
1872 Value *ShadowPtr = IRB.CreateIntToPtr(
1873 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1874
1875 Value *OriginPtr = nullptr;
1876 if (MS.TrackOrigins) {
1877 Value *OriginLong = ShadowOffset;
1878 uint64_t OriginBase = MS.MapParams->OriginBase;
1879 if (OriginBase != 0)
1880 OriginLong =
1881 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1882 if (!Alignment || *Alignment < kMinOriginAlignment) {
1883 uint64_t Mask = kMinOriginAlignment.value() - 1;
1884 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1885 }
1886 OriginPtr = IRB.CreateIntToPtr(
1887 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1888 }
1889 return std::make_pair(ShadowPtr, OriginPtr);
1890 }
1891
1892 template <typename... ArgsTy>
1893 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1894 ArgsTy... Args) {
1895 if (MS.TargetTriple.getArch() == Triple::systemz) {
1896 IRB.CreateCall(Callee,
1897 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1898 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1899 }
1900
1901 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1902 }
1903
1904 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1905 IRBuilder<> &IRB,
1906 Type *ShadowTy,
1907 bool isStore) {
1908 Value *ShadowOriginPtrs;
1909 const DataLayout &DL = F.getDataLayout();
1910 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1911
1912 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1913 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1914 if (Getter) {
1915 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1916 } else {
1917 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1918 ShadowOriginPtrs = createMetadataCall(
1919 IRB,
1920 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1921 AddrCast, SizeVal);
1922 }
1923 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1924 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1925 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1926
1927 return std::make_pair(ShadowPtr, OriginPtr);
1928 }
1929
1930 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1931 /// a single pointee.
1932 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1933 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1934 IRBuilder<> &IRB,
1935 Type *ShadowTy,
1936 bool isStore) {
1937 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1938 if (!VectTy) {
1939 assert(Addr->getType()->isPointerTy());
1940 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1941 }
1942
1943 // TODO: Support callbacs with vectors of addresses.
1944 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1945 Value *ShadowPtrs = ConstantInt::getNullValue(
1946 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1947 Value *OriginPtrs = nullptr;
1948 if (MS.TrackOrigins)
1949 OriginPtrs = ConstantInt::getNullValue(
1950 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1951 for (unsigned i = 0; i < NumElements; ++i) {
1952 Value *OneAddr =
1953 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1954 auto [ShadowPtr, OriginPtr] =
1955 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1956
1957 ShadowPtrs = IRB.CreateInsertElement(
1958 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1959 if (MS.TrackOrigins)
1960 OriginPtrs = IRB.CreateInsertElement(
1961 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1962 }
1963 return {ShadowPtrs, OriginPtrs};
1964 }
1965
1966 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1967 Type *ShadowTy,
1968 MaybeAlign Alignment,
1969 bool isStore) {
1970 if (MS.CompileKernel)
1971 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1972 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1973 }
1974
1975 /// Compute the shadow address for a given function argument.
1976 ///
1977 /// Shadow = ParamTLS+ArgOffset.
1978 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1979 return IRB.CreatePtrAdd(MS.ParamTLS,
1980 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1981 }
1982
1983 /// Compute the origin address for a given function argument.
1984 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1985 if (!MS.TrackOrigins)
1986 return nullptr;
1987 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1988 ConstantInt::get(MS.IntptrTy, ArgOffset),
1989 "_msarg_o");
1990 }
1991
1992 /// Compute the shadow address for a retval.
1993 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1994 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1995 }
1996
1997 /// Compute the origin address for a retval.
1998 Value *getOriginPtrForRetval() {
1999 // We keep a single origin for the entire retval. Might be too optimistic.
2000 return MS.RetvalOriginTLS;
2001 }
2002
2003 /// Set SV to be the shadow value for V.
2004 void setShadow(Value *V, Value *SV) {
2005 assert(!ShadowMap.count(V) && "Values may only have one shadow");
2006 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
2007 }
2008
2009 /// Set Origin to be the origin value for V.
2010 void setOrigin(Value *V, Value *Origin) {
2011 if (!MS.TrackOrigins)
2012 return;
2013 assert(!OriginMap.count(V) && "Values may only have one origin");
2014 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
2015 OriginMap[V] = Origin;
2016 }
2017
2018 Constant *getCleanShadow(Type *OrigTy) {
2019 Type *ShadowTy = getShadowTy(OrigTy);
2020 if (!ShadowTy)
2021 return nullptr;
2022 return Constant::getNullValue(ShadowTy);
2023 }
2024
2025 /// Create a clean shadow value for a given value.
2026 ///
2027 /// Clean shadow (all zeroes) means all bits of the value are defined
2028 /// (initialized).
2029 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2030
2031 /// Create a dirty shadow of a given shadow type.
2032 Constant *getPoisonedShadow(Type *ShadowTy) {
2033 assert(ShadowTy);
2034 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2035 return Constant::getAllOnesValue(ShadowTy);
2036 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2037 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2038 getPoisonedShadow(AT->getElementType()));
2039 return ConstantArray::get(AT, Vals);
2040 }
2041 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2042 SmallVector<Constant *, 4> Vals;
2043 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2044 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2045 return ConstantStruct::get(ST, Vals);
2046 }
2047 llvm_unreachable("Unexpected shadow type");
2048 }
2049
2050 /// Create a dirty shadow for a given value.
2051 Constant *getPoisonedShadow(Value *V) {
2052 Type *ShadowTy = getShadowTy(V);
2053 if (!ShadowTy)
2054 return nullptr;
2055 return getPoisonedShadow(ShadowTy);
2056 }
2057
2058 /// Create a clean (zero) origin.
2059 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2060
2061 /// Get the shadow value for a given Value.
2062 ///
2063 /// This function either returns the value set earlier with setShadow,
2064 /// or extracts if from ParamTLS (for function arguments).
2065 Value *getShadow(Value *V) {
2066 if (Instruction *I = dyn_cast<Instruction>(V)) {
2067 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2068 return getCleanShadow(V);
2069 // For instructions the shadow is already stored in the map.
2070 Value *Shadow = ShadowMap[V];
2071 if (!Shadow) {
2072 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2073 assert(Shadow && "No shadow for a value");
2074 }
2075 return Shadow;
2076 }
2077 // Handle fully undefined values
2078 // (partially undefined constant vectors are handled later)
2079 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2080 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2081 : getCleanShadow(V);
2082 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2083 return AllOnes;
2084 }
2085 if (Argument *A = dyn_cast<Argument>(V)) {
2086 // For arguments we compute the shadow on demand and store it in the map.
2087 Value *&ShadowPtr = ShadowMap[V];
2088 if (ShadowPtr)
2089 return ShadowPtr;
2090 Function *F = A->getParent();
2091 IRBuilder<> EntryIRB(FnPrologueEnd);
2092 unsigned ArgOffset = 0;
2093 const DataLayout &DL = F->getDataLayout();
2094 for (auto &FArg : F->args()) {
2095 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2096 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2097 ? "vscale not fully supported\n"
2098 : "Arg is not sized\n"));
2099 if (A == &FArg) {
2100 ShadowPtr = getCleanShadow(V);
2101 setOrigin(A, getCleanOrigin());
2102 break;
2103 }
2104 continue;
2105 }
2106
2107 unsigned Size = FArg.hasByValAttr()
2108 ? DL.getTypeAllocSize(FArg.getParamByValType())
2109 : DL.getTypeAllocSize(FArg.getType());
2110
2111 if (A == &FArg) {
2112 bool Overflow = ArgOffset + Size > kParamTLSSize;
2113 if (FArg.hasByValAttr()) {
2114 // ByVal pointer itself has clean shadow. We copy the actual
2115 // argument shadow to the underlying memory.
2116 // Figure out maximal valid memcpy alignment.
2117 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2118 FArg.getParamAlign(), FArg.getParamByValType());
2119 Value *CpShadowPtr, *CpOriginPtr;
2120 std::tie(CpShadowPtr, CpOriginPtr) =
2121 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2122 /*isStore*/ true);
2123 if (!PropagateShadow || Overflow) {
2124 // ParamTLS overflow.
2125 EntryIRB.CreateMemSet(
2126 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2127 Size, ArgAlign);
2128 } else {
2129 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2130 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2131 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2132 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2133 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2134
2135 if (MS.TrackOrigins) {
2136 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2137 // FIXME: OriginSize should be:
2138 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2139 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2140 EntryIRB.CreateMemCpy(
2141 CpOriginPtr,
2142 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2143 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2144 OriginSize);
2145 }
2146 }
2147 }
2148
2149 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2150 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2151 ShadowPtr = getCleanShadow(V);
2152 setOrigin(A, getCleanOrigin());
2153 } else {
2154 // Shadow over TLS
2155 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2156 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2158 if (MS.TrackOrigins) {
2159 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2160 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2161 }
2162 }
2164 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2165 break;
2166 }
2167
2168 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2169 }
2170 assert(ShadowPtr && "Could not find shadow for an argument");
2171 return ShadowPtr;
2172 }
2173
2174 // Check for partially-undefined constant vectors
2175 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2176 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2177 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2178 PoisonUndefVectors) {
2179 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2180 SmallVector<Constant *, 32> ShadowVector(NumElems);
2181 for (unsigned i = 0; i != NumElems; ++i) {
2182 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2183 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2184 : getCleanShadow(Elem);
2185 }
2186
2187 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2188 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2189 << *ShadowConstant << "\n");
2190
2191 return ShadowConstant;
2192 }
2193
2194 // TODO: partially-undefined constant arrays, structures, and nested types
2195
2196 // For everything else the shadow is zero.
2197 return getCleanShadow(V);
2198 }
2199
2200 /// Get the shadow for i-th argument of the instruction I.
2201 Value *getShadow(Instruction *I, int i) {
2202 return getShadow(I->getOperand(i));
2203 }
2204
2205 /// Get the origin for a value.
2206 Value *getOrigin(Value *V) {
2207 if (!MS.TrackOrigins)
2208 return nullptr;
2209 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2210 return getCleanOrigin();
2212 "Unexpected value type in getOrigin()");
2213 if (Instruction *I = dyn_cast<Instruction>(V)) {
2214 if (I->getMetadata(LLVMContext::MD_nosanitize))
2215 return getCleanOrigin();
2216 }
2217 Value *Origin = OriginMap[V];
2218 assert(Origin && "Missing origin");
2219 return Origin;
2220 }
2221
2222 /// Get the origin for i-th argument of the instruction I.
2223 Value *getOrigin(Instruction *I, int i) {
2224 return getOrigin(I->getOperand(i));
2225 }
2226
2227 /// Remember the place where a shadow check should be inserted.
2228 ///
2229 /// This location will be later instrumented with a check that will print a
2230 /// UMR warning in runtime if the shadow value is not 0.
2231 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2232 assert(Shadow);
2233 if (!InsertChecks)
2234 return;
2235
2236 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2237 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2238 << *OrigIns << "\n");
2239 return;
2240 }
2241
2242 Type *ShadowTy = Shadow->getType();
2243 if (isScalableNonVectorType(ShadowTy)) {
2244 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2245 << " before " << *OrigIns << "\n");
2246 return;
2247 }
2248#ifndef NDEBUG
2249 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2250 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2251 "Can only insert checks for integer, vector, and aggregate shadow "
2252 "types");
2253#endif
2254 InstrumentationList.push_back(
2255 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2256 }
2257
2258 /// Get shadow for value, and remember the place where a shadow check should
2259 /// be inserted.
2260 ///
2261 /// This location will be later instrumented with a check that will print a
2262 /// UMR warning in runtime if the value is not fully defined.
2263 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2264 assert(Val);
2265 Value *Shadow, *Origin;
2267 Shadow = getShadow(Val);
2268 if (!Shadow)
2269 return;
2270 Origin = getOrigin(Val);
2271 } else {
2272 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2273 if (!Shadow)
2274 return;
2275 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2276 }
2277 insertCheckShadow(Shadow, Origin, OrigIns);
2278 }
2279
2281 switch (a) {
2282 case AtomicOrdering::NotAtomic:
2283 return AtomicOrdering::NotAtomic;
2284 case AtomicOrdering::Unordered:
2285 case AtomicOrdering::Monotonic:
2286 case AtomicOrdering::Release:
2287 return AtomicOrdering::Release;
2288 case AtomicOrdering::Acquire:
2289 case AtomicOrdering::AcquireRelease:
2290 return AtomicOrdering::AcquireRelease;
2291 case AtomicOrdering::SequentiallyConsistent:
2292 return AtomicOrdering::SequentiallyConsistent;
2293 }
2294 llvm_unreachable("Unknown ordering");
2295 }
2296
2297 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2298 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2299 uint32_t OrderingTable[NumOrderings] = {};
2300
2301 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2302 OrderingTable[(int)AtomicOrderingCABI::release] =
2303 (int)AtomicOrderingCABI::release;
2304 OrderingTable[(int)AtomicOrderingCABI::consume] =
2305 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2306 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2307 (int)AtomicOrderingCABI::acq_rel;
2308 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2309 (int)AtomicOrderingCABI::seq_cst;
2310
2311 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2312 }
2313
2315 switch (a) {
2316 case AtomicOrdering::NotAtomic:
2317 return AtomicOrdering::NotAtomic;
2318 case AtomicOrdering::Unordered:
2319 case AtomicOrdering::Monotonic:
2320 case AtomicOrdering::Acquire:
2321 return AtomicOrdering::Acquire;
2322 case AtomicOrdering::Release:
2323 case AtomicOrdering::AcquireRelease:
2324 return AtomicOrdering::AcquireRelease;
2325 case AtomicOrdering::SequentiallyConsistent:
2326 return AtomicOrdering::SequentiallyConsistent;
2327 }
2328 llvm_unreachable("Unknown ordering");
2329 }
2330
2331 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2332 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2333 uint32_t OrderingTable[NumOrderings] = {};
2334
2335 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2336 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2337 OrderingTable[(int)AtomicOrderingCABI::consume] =
2338 (int)AtomicOrderingCABI::acquire;
2339 OrderingTable[(int)AtomicOrderingCABI::release] =
2340 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2341 (int)AtomicOrderingCABI::acq_rel;
2342 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2343 (int)AtomicOrderingCABI::seq_cst;
2344
2345 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2346 }
2347
2348 // ------------------- Visitors.
2349 using InstVisitor<MemorySanitizerVisitor>::visit;
2350 void visit(Instruction &I) {
2351 if (I.getMetadata(LLVMContext::MD_nosanitize))
2352 return;
2353 // Don't want to visit if we're in the prologue
2354 if (isInPrologue(I))
2355 return;
2356 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2357 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2358 // We still need to set the shadow and origin to clean values.
2359 setShadow(&I, getCleanShadow(&I));
2360 setOrigin(&I, getCleanOrigin());
2361 return;
2362 }
2363
2364 Instructions.push_back(&I);
2365 }
2366
2367 /// Instrument LoadInst
2368 ///
2369 /// Loads the corresponding shadow and (optionally) origin.
2370 /// Optionally, checks that the load address is fully defined.
2371 void visitLoadInst(LoadInst &I) {
2372 assert(I.getType()->isSized() && "Load type must have size");
2373 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2374 NextNodeIRBuilder IRB(&I);
2375 Type *ShadowTy = getShadowTy(&I);
2376 Value *Addr = I.getPointerOperand();
2377 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2378 const Align Alignment = I.getAlign();
2379 if (PropagateShadow) {
2380 std::tie(ShadowPtr, OriginPtr) =
2381 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2382 setShadow(&I,
2383 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2384 } else {
2385 setShadow(&I, getCleanShadow(&I));
2386 }
2387
2389 insertCheckShadowOf(I.getPointerOperand(), &I);
2390
2391 if (I.isAtomic())
2392 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2393
2394 if (MS.TrackOrigins) {
2395 if (PropagateShadow) {
2396 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2397 setOrigin(
2398 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2399 } else {
2400 setOrigin(&I, getCleanOrigin());
2401 }
2402 }
2403 }
2404
2405 /// Instrument StoreInst
2406 ///
2407 /// Stores the corresponding shadow and (optionally) origin.
2408 /// Optionally, checks that the store address is fully defined.
2409 void visitStoreInst(StoreInst &I) {
2410 StoreList.push_back(&I);
2412 insertCheckShadowOf(I.getPointerOperand(), &I);
2413 }
2414
2415 void handleCASOrRMW(Instruction &I) {
2417
2418 IRBuilder<> IRB(&I);
2419 Value *Addr = I.getOperand(0);
2420 Value *Val = I.getOperand(1);
2421 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2422 /*isStore*/ true)
2423 .first;
2424
2426 insertCheckShadowOf(Addr, &I);
2427
2428 // Only test the conditional argument of cmpxchg instruction.
2429 // The other argument can potentially be uninitialized, but we can not
2430 // detect this situation reliably without possible false positives.
2432 insertCheckShadowOf(Val, &I);
2433
2434 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2435
2436 setShadow(&I, getCleanShadow(&I));
2437 setOrigin(&I, getCleanOrigin());
2438 }
2439
2440 void visitAtomicRMWInst(AtomicRMWInst &I) {
2441 handleCASOrRMW(I);
2442 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2443 }
2444
2445 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2446 handleCASOrRMW(I);
2447 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2448 }
2449
2450 /// Generic handler to compute shadow for == and != comparisons.
2451 ///
2452 /// This function is used by handleEqualityComparison and visitSwitchInst.
2453 ///
2454 /// Sometimes the comparison result is known even if some of the bits of the
2455 /// arguments are not.
2456 Value *propagateEqualityComparison(IRBuilder<> &IRB, Value *A, Value *B,
2457 Value *Sa, Value *Sb) {
2458 assert(getShadowTy(A) == Sa->getType());
2459 assert(getShadowTy(B) == Sb->getType());
2460
2461 // Get rid of pointers and vectors of pointers.
2462 // For ints (and vectors of ints), types of A and Sa match,
2463 // and this is a no-op.
2464 A = IRB.CreatePointerCast(A, Sa->getType());
2465 B = IRB.CreatePointerCast(B, Sb->getType());
2466
2467 // A == B <==> (C = A^B) == 0
2468 // A != B <==> (C = A^B) != 0
2469 // Sc = Sa | Sb
2470 Value *C = IRB.CreateXor(A, B);
2471 Value *Sc = IRB.CreateOr(Sa, Sb);
2472 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2473 // Result is defined if one of the following is true
2474 // * there is a defined 1 bit in C
2475 // * C is fully defined
2476 // Si = !(C & ~Sc) && Sc
2478 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2479 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2480 Value *RHS =
2481 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2482 Value *Si = IRB.CreateAnd(LHS, RHS);
2483 Si->setName("_msprop_icmp");
2484
2485 return Si;
2486 }
2487
2488 // Instrument:
2489 // switch i32 %Val, label %else [ i32 0, label %A
2490 // i32 1, label %B
2491 // i32 2, label %C ]
2492 //
2493 // Typically, the switch input value (%Val) is fully initialized.
2494 //
2495 // Sometimes the compiler may convert (icmp + br) into a switch statement.
2496 // MSan allows icmp eq/ne with partly initialized inputs to still result in a
2497 // fully initialized output, if there exists a bit that is initialized in
2498 // both inputs with a differing value. For compatibility, we support this in
2499 // the switch instrumentation as well. Note that this edge case only applies
2500 // if the switch input value does not match *any* of the cases (matching any
2501 // of the cases requires an exact, fully initialized match).
2502 //
2503 // ShadowCases = 0
2504 // | propagateEqualityComparison(Val, 0)
2505 // | propagateEqualityComparison(Val, 1)
2506 // | propagateEqualityComparison(Val, 2))
2507 void visitSwitchInst(SwitchInst &SI) {
2508 IRBuilder<> IRB(&SI);
2509
2510 Value *Val = SI.getCondition();
2511 Value *ShadowVal = getShadow(Val);
2512 // TODO: add fast path - if the condition is fully initialized, we know
2513 // there is no UUM, without needing to consider the case values below.
2514
2515 // Some code (e.g., AMDGPUGenMCCodeEmitter.inc) has tens of thousands of
2516 // cases. This results in an extremely long chained expression for MSan's
2517 // switch instrumentation, which can cause the JumpThreadingPass to have a
2518 // stack overflow or excessive runtime. We limit the number of cases
2519 // considered, with the tradeoff of niche false negatives.
2520 // TODO: figure out a better solution.
2521 int casesToConsider = ClSwitchPrecision;
2522
2523 Value *ShadowCases = nullptr;
2524 for (auto Case : SI.cases()) {
2525 if (casesToConsider <= 0)
2526 break;
2527
2528 Value *Comparator = Case.getCaseValue();
2529 // TODO: some simplification is possible when comparing multiple cases
2530 // simultaneously.
2531 Value *ComparisonShadow = propagateEqualityComparison(
2532 IRB, Val, Comparator, ShadowVal, getShadow(Comparator));
2533
2534 if (ShadowCases)
2535 ShadowCases = IRB.CreateOr(ShadowCases, ComparisonShadow);
2536 else
2537 ShadowCases = ComparisonShadow;
2538
2539 casesToConsider--;
2540 }
2541
2542 if (ShadowCases)
2543 insertCheckShadow(ShadowCases, getOrigin(Val), &SI);
2544 }
2545
2546 // Vector manipulation.
2547 void visitExtractElementInst(ExtractElementInst &I) {
2548 insertCheckShadowOf(I.getOperand(1), &I);
2549 IRBuilder<> IRB(&I);
2550 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2551 "_msprop"));
2552 setOrigin(&I, getOrigin(&I, 0));
2553 }
2554
2555 void visitInsertElementInst(InsertElementInst &I) {
2556 insertCheckShadowOf(I.getOperand(2), &I);
2557 IRBuilder<> IRB(&I);
2558 auto *Shadow0 = getShadow(&I, 0);
2559 auto *Shadow1 = getShadow(&I, 1);
2560 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2561 "_msprop"));
2562 setOriginForNaryOp(I);
2563 }
2564
2565 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2566 IRBuilder<> IRB(&I);
2567 auto *Shadow0 = getShadow(&I, 0);
2568 auto *Shadow1 = getShadow(&I, 1);
2569 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2570 "_msprop"));
2571 setOriginForNaryOp(I);
2572 }
2573
2574 // Casts.
2575 void visitSExtInst(SExtInst &I) {
2576 IRBuilder<> IRB(&I);
2577 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2578 setOrigin(&I, getOrigin(&I, 0));
2579 }
2580
2581 void visitZExtInst(ZExtInst &I) {
2582 IRBuilder<> IRB(&I);
2583 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2584 setOrigin(&I, getOrigin(&I, 0));
2585 }
2586
2587 void visitTruncInst(TruncInst &I) {
2588 IRBuilder<> IRB(&I);
2589 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2590 setOrigin(&I, getOrigin(&I, 0));
2591 }
2592
2593 void visitBitCastInst(BitCastInst &I) {
2594 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2595 // a musttail call and a ret, don't instrument. New instructions are not
2596 // allowed after a musttail call.
2597 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2598 if (CI->isMustTailCall())
2599 return;
2600 IRBuilder<> IRB(&I);
2601 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2602 setOrigin(&I, getOrigin(&I, 0));
2603 }
2604
2605 void visitPtrToIntInst(PtrToIntInst &I) {
2606 IRBuilder<> IRB(&I);
2607 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2608 "_msprop_ptrtoint"));
2609 setOrigin(&I, getOrigin(&I, 0));
2610 }
2611
2612 void visitIntToPtrInst(IntToPtrInst &I) {
2613 IRBuilder<> IRB(&I);
2614 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2615 "_msprop_inttoptr"));
2616 setOrigin(&I, getOrigin(&I, 0));
2617 }
2618
2619 /// Handle LLVM and NEON vector convert intrinsics.
2620 ///
2621 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
2622 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
2623 /// <2 x i32> @fptoui (<2 x float>)
2624 /// i64 @llvm.fptosi.sat.i64.f64(double)
2625 ///
2626 /// Note that the size of input/output elements can differ e.g.,
2627 /// double @sitofp(i32)
2628 /// but the number of elements must be the same.
2629 ///
2630 /// For conversions to or from fixed-point, there is a trailing argument to
2631 /// indicate the fixed-point precision:
2632 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
2633 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
2634 ///
2635 /// For x86 SSE vector convert intrinsics, see
2636 /// handleSSEVectorConvertIntrinsic().
2637 void handleGenericVectorConvertIntrinsic(Instruction &I, bool FixedPoint) {
2638 [[maybe_unused]] unsigned NumArgs = I.getNumOperands();
2639 if (auto *CI = dyn_cast<CallInst>(&I))
2640 NumArgs = CI->arg_size();
2641
2642 if (FixedPoint) {
2643 assert(NumArgs == 2);
2644 Value *Precision = I.getOperand(1);
2645 insertCheckShadowOf(Precision, &I);
2646 } else {
2647 assert(NumArgs == 1);
2648 }
2649
2650 IRBuilder<> IRB(&I);
2651 Value *S0 = getShadow(&I, 0);
2652
2653 /// For scalars:
2654 /// Since they are converting from floating-point to integer, the output is
2655 /// - fully uninitialized if *any* bit of the input is uninitialized
2656 /// - fully ininitialized if all bits of the input are ininitialized
2657 /// We apply the same principle on a per-field basis for vectors.
2658 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
2659 getShadowTy(&I));
2660 setShadow(&I, OutShadow);
2661 setOriginForNaryOp(I);
2662 }
2663
2664 void visitFPToSIInst(CastInst &I) {
2665 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/false);
2666 }
2667 void visitFPToUIInst(CastInst &I) {
2668 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/false);
2669 }
2670 void visitSIToFPInst(CastInst &I) {
2671 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/false);
2672 }
2673 void visitUIToFPInst(CastInst &I) {
2674 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/false);
2675 }
2676 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2677 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2678
2679 /// Generic handler to compute shadow for bitwise AND.
2680 ///
2681 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2682 ///
2683 /// This code is precise: it implements the rule that "And" of an initialized
2684 /// zero bit always results in an initialized value:
2685 // 1&1 => 1; 0&1 => 0; p&1 => p;
2686 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2687 // 1&p => p; 0&p => 0; p&p => p;
2688 //
2689 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2690 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2691 Value *S2) {
2692 // "The two arguments to the ‘and’ instruction must be integer or vector
2693 // of integer values. Both arguments must have identical types."
2694 //
2695 // We enforce this condition for all callers to handleBitwiseAnd(); callers
2696 // with non-integer types should call CreateAppToShadowCast() themselves.
2698 assert(V1->getType() == V2->getType());
2699
2700 // Conveniently, getShadowTy() of Int/IntVector returns the original type.
2701 assert(V1->getType() == S1->getType());
2702 assert(V2->getType() == S2->getType());
2703
2704 Value *S1S2 = IRB.CreateAnd(S1, S2);
2705 Value *V1S2 = IRB.CreateAnd(V1, S2);
2706 Value *S1V2 = IRB.CreateAnd(S1, V2);
2707
2708 return IRB.CreateOr({S1S2, V1S2, S1V2});
2709 }
2710
2711 /// Handler for bitwise AND operator.
2712 void visitAnd(BinaryOperator &I) {
2713 IRBuilder<> IRB(&I);
2714 Value *V1 = I.getOperand(0);
2715 Value *V2 = I.getOperand(1);
2716 Value *S1 = getShadow(&I, 0);
2717 Value *S2 = getShadow(&I, 1);
2718
2719 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2720
2721 setShadow(&I, OutShadow);
2722 setOriginForNaryOp(I);
2723 }
2724
2725 void visitOr(BinaryOperator &I) {
2726 IRBuilder<> IRB(&I);
2727 // "Or" of 1 and a poisoned value results in unpoisoned value:
2728 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2729 // 1|0 => 1; 0|0 => 0; p|0 => p;
2730 // 1|p => 1; 0|p => p; p|p => p;
2731 //
2732 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2733 //
2734 // If the "disjoint OR" property is violated, the result is poison, and
2735 // hence the entire shadow is uninitialized:
2736 // S = S | SignExt(V1 & V2 != 0)
2737 Value *S1 = getShadow(&I, 0);
2738 Value *S2 = getShadow(&I, 1);
2739 Value *V1 = I.getOperand(0);
2740 Value *V2 = I.getOperand(1);
2741
2742 // "The two arguments to the ‘or’ instruction must be integer or vector
2743 // of integer values. Both arguments must have identical types."
2745 assert(V1->getType() == V2->getType());
2746
2747 // Conveniently, getShadowTy() of Int/IntVector returns the original type.
2748 assert(V1->getType() == S1->getType());
2749 assert(V2->getType() == S2->getType());
2750
2751 Value *NotV1 = IRB.CreateNot(V1);
2752 Value *NotV2 = IRB.CreateNot(V2);
2753
2754 Value *S1S2 = IRB.CreateAnd(S1, S2);
2755 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2756 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2757
2758 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2759
2760 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2761 Value *V1V2 = IRB.CreateAnd(V1, V2);
2762 Value *DisjointOrShadow = IRB.CreateSExt(
2763 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2764 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2765 }
2766
2767 setShadow(&I, S);
2768 setOriginForNaryOp(I);
2769 }
2770
2771 /// Default propagation of shadow and/or origin.
2772 ///
2773 /// This class implements the general case of shadow propagation, used in all
2774 /// cases where we don't know and/or don't care about what the operation
2775 /// actually does. It converts all input shadow values to a common type
2776 /// (extending or truncating as necessary), and bitwise OR's them.
2777 ///
2778 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2779 /// fully initialized), and less prone to false positives.
2780 ///
2781 /// This class also implements the general case of origin propagation. For a
2782 /// Nary operation, result origin is set to the origin of an argument that is
2783 /// not entirely initialized. If there is more than one such arguments, the
2784 /// rightmost of them is picked. It does not matter which one is picked if all
2785 /// arguments are initialized.
2786 template <bool CombineShadow> class Combiner {
2787 Value *Shadow = nullptr;
2788 Value *Origin = nullptr;
2789 IRBuilder<> &IRB;
2790 MemorySanitizerVisitor *MSV;
2791
2792 public:
2793 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2794 : IRB(IRB), MSV(MSV) {}
2795
2796 /// Add a pair of shadow and origin values to the mix.
2797 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2798 if (CombineShadow) {
2799 assert(OpShadow);
2800 if (!Shadow)
2801 Shadow = OpShadow;
2802 else {
2803 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2804 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2805 }
2806 }
2807
2808 if (MSV->MS.TrackOrigins) {
2809 assert(OpOrigin);
2810 if (!Origin) {
2811 Origin = OpOrigin;
2812 } else {
2813 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2814 // No point in adding something that might result in 0 origin value.
2815 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2816 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2817 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2818 }
2819 }
2820 }
2821 return *this;
2822 }
2823
2824 /// Add an application value to the mix.
2825 Combiner &Add(Value *V) {
2826 Value *OpShadow = MSV->getShadow(V);
2827 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2828 return Add(OpShadow, OpOrigin);
2829 }
2830
2831 /// Set the current combined values as the given instruction's shadow
2832 /// and origin.
2833 void Done(Instruction *I) {
2834 if (CombineShadow) {
2835 assert(Shadow);
2836 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2837 MSV->setShadow(I, Shadow);
2838 }
2839 if (MSV->MS.TrackOrigins) {
2840 assert(Origin);
2841 MSV->setOrigin(I, Origin);
2842 }
2843 }
2844
2845 /// Store the current combined value at the specified origin
2846 /// location.
2847 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2848 if (MSV->MS.TrackOrigins) {
2849 assert(Origin);
2850 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2851 }
2852 }
2853 };
2854
2855 using ShadowAndOriginCombiner = Combiner<true>;
2856 using OriginCombiner = Combiner<false>;
2857
2858 /// Propagate origin for arbitrary operation.
2859 void setOriginForNaryOp(Instruction &I) {
2860 if (!MS.TrackOrigins)
2861 return;
2862 IRBuilder<> IRB(&I);
2863 OriginCombiner OC(this, IRB);
2864 for (Use &Op : I.operands())
2865 OC.Add(Op.get());
2866 OC.Done(&I);
2867 }
2868
2869 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2870 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2871 "Vector of pointers is not a valid shadow type");
2872 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2874 : Ty->getPrimitiveSizeInBits();
2875 }
2876
2877 /// Cast between two shadow types, extending or truncating as
2878 /// necessary.
2879 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2880 bool Signed = false) {
2881 Type *srcTy = V->getType();
2882 if (srcTy == dstTy)
2883 return V;
2884 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2885 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2886 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2887 return IRB.CreateICmpNE(V, getCleanShadow(V));
2888
2889 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2890 return IRB.CreateIntCast(V, dstTy, Signed);
2891 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2892 cast<VectorType>(dstTy)->getElementCount() ==
2893 cast<VectorType>(srcTy)->getElementCount())
2894 return IRB.CreateIntCast(V, dstTy, Signed);
2895 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2896 Value *V2 =
2897 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2898 return IRB.CreateBitCast(V2, dstTy);
2899 // TODO: handle struct types.
2900 }
2901
2902 /// Cast an application value to the type of its own shadow.
2903 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2904 Type *ShadowTy = getShadowTy(V);
2905 if (V->getType() == ShadowTy)
2906 return V;
2907 if (V->getType()->isPtrOrPtrVectorTy())
2908 return IRB.CreatePtrToInt(V, ShadowTy);
2909 else
2910 return IRB.CreateBitCast(V, ShadowTy);
2911 }
2912
2913 /// Propagate shadow for arbitrary operation.
2914 void handleShadowOr(Instruction &I) {
2915 IRBuilder<> IRB(&I);
2916 ShadowAndOriginCombiner SC(this, IRB);
2917 for (Use &Op : I.operands())
2918 SC.Add(Op.get());
2919 SC.Done(&I);
2920 }
2921
2922 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2923 // of elements.
2924 //
2925 // For example, suppose we have:
2926 // VectorA: <a0, a1, a2, a3, a4, a5>
2927 // VectorB: <b0, b1, b2, b3, b4, b5>
2928 // ReductionFactor: 3
2929 // Shards: 1
2930 // The output would be:
2931 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2932 //
2933 // If we have:
2934 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2935 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2936 // ReductionFactor: 2
2937 // Shards: 2
2938 // then a and be each have 2 "shards", resulting in the output being
2939 // interleaved:
2940 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2941 //
2942 // This is convenient for instrumenting horizontal add/sub.
2943 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2944 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2945 unsigned Shards, Value *VectorA, Value *VectorB) {
2946 assert(isa<FixedVectorType>(VectorA->getType()));
2947 unsigned NumElems =
2948 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2949
2950 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2951 if (VectorB) {
2952 assert(VectorA->getType() == VectorB->getType());
2953 TotalNumElems *= 2;
2954 }
2955
2956 assert(NumElems % (ReductionFactor * Shards) == 0);
2957
2958 Value *Or = nullptr;
2959
2960 IRBuilder<> IRB(&I);
2961 for (unsigned i = 0; i < ReductionFactor; i++) {
2962 SmallVector<int, 16> Mask;
2963
2964 for (unsigned j = 0; j < Shards; j++) {
2965 unsigned Offset = NumElems / Shards * j;
2966
2967 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2968 Mask.push_back(Offset + X + i);
2969
2970 if (VectorB) {
2971 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2972 Mask.push_back(NumElems + Offset + X + i);
2973 }
2974 }
2975
2976 Value *Masked;
2977 if (VectorB)
2978 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2979 else
2980 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2981
2982 if (Or)
2983 Or = IRB.CreateOr(Or, Masked);
2984 else
2985 Or = Masked;
2986 }
2987
2988 return Or;
2989 }
2990
2991 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2992 /// fields.
2993 ///
2994 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2995 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2996 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2997 assert(I.arg_size() == 1 || I.arg_size() == 2);
2998
2999 assert(I.getType()->isVectorTy());
3000 assert(I.getArgOperand(0)->getType()->isVectorTy());
3001
3002 [[maybe_unused]] FixedVectorType *ParamType =
3003 cast<FixedVectorType>(I.getArgOperand(0)->getType());
3004 assert((I.arg_size() != 2) ||
3005 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
3006 [[maybe_unused]] FixedVectorType *ReturnType =
3007 cast<FixedVectorType>(I.getType());
3008 assert(ParamType->getNumElements() * I.arg_size() ==
3009 2 * ReturnType->getNumElements());
3010
3011 IRBuilder<> IRB(&I);
3012
3013 // Horizontal OR of shadow
3014 Value *FirstArgShadow = getShadow(&I, 0);
3015 Value *SecondArgShadow = nullptr;
3016 if (I.arg_size() == 2)
3017 SecondArgShadow = getShadow(&I, 1);
3018
3019 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
3020 FirstArgShadow, SecondArgShadow);
3021
3022 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
3023
3024 setShadow(&I, OrShadow);
3025 setOriginForNaryOp(I);
3026 }
3027
3028 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
3029 /// fields, with the parameters reinterpreted to have elements of a specified
3030 /// width. For example:
3031 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
3032 /// conceptually operates on
3033 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
3034 /// and can be handled with ReinterpretElemWidth == 16.
3035 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
3036 int ReinterpretElemWidth) {
3037 assert(I.arg_size() == 1 || I.arg_size() == 2);
3038
3039 assert(I.getType()->isVectorTy());
3040 assert(I.getArgOperand(0)->getType()->isVectorTy());
3041
3042 FixedVectorType *ParamType =
3043 cast<FixedVectorType>(I.getArgOperand(0)->getType());
3044 assert((I.arg_size() != 2) ||
3045 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
3046
3047 [[maybe_unused]] FixedVectorType *ReturnType =
3048 cast<FixedVectorType>(I.getType());
3049 assert(ParamType->getNumElements() * I.arg_size() ==
3050 2 * ReturnType->getNumElements());
3051
3052 IRBuilder<> IRB(&I);
3053
3054 FixedVectorType *ReinterpretShadowTy = nullptr;
3055 assert(isAligned(Align(ReinterpretElemWidth),
3056 ParamType->getPrimitiveSizeInBits()));
3057 ReinterpretShadowTy = FixedVectorType::get(
3058 IRB.getIntNTy(ReinterpretElemWidth),
3059 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
3060
3061 // Horizontal OR of shadow
3062 Value *FirstArgShadow = getShadow(&I, 0);
3063 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
3064
3065 // If we had two parameters each with an odd number of elements, the total
3066 // number of elements is even, but we have never seen this in extant
3067 // instruction sets, so we enforce that each parameter must have an even
3068 // number of elements.
3070 Align(2),
3071 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
3072
3073 Value *SecondArgShadow = nullptr;
3074 if (I.arg_size() == 2) {
3075 SecondArgShadow = getShadow(&I, 1);
3076 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
3077 }
3078
3079 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
3080 FirstArgShadow, SecondArgShadow);
3081
3082 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
3083
3084 setShadow(&I, OrShadow);
3085 setOriginForNaryOp(I);
3086 }
3087
3088 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
3089
3090 // Handle multiplication by constant.
3091 //
3092 // Handle a special case of multiplication by constant that may have one or
3093 // more zeros in the lower bits. This makes corresponding number of lower bits
3094 // of the result zero as well. We model it by shifting the other operand
3095 // shadow left by the required number of bits. Effectively, we transform
3096 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
3097 // We use multiplication by 2**N instead of shift to cover the case of
3098 // multiplication by 0, which may occur in some elements of a vector operand.
3099 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
3100 Value *OtherArg) {
3101 Constant *ShadowMul;
3102 Type *Ty = ConstArg->getType();
3103 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
3104 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
3105 Type *EltTy = VTy->getElementType();
3107 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
3108 if (ConstantInt *Elt =
3110 const APInt &V = Elt->getValue();
3111 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3112 Elements.push_back(ConstantInt::get(EltTy, V2));
3113 } else {
3114 Elements.push_back(ConstantInt::get(EltTy, 1));
3115 }
3116 }
3117 ShadowMul = ConstantVector::get(Elements);
3118 } else {
3119 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
3120 const APInt &V = Elt->getValue();
3121 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3122 ShadowMul = ConstantInt::get(Ty, V2);
3123 } else {
3124 ShadowMul = ConstantInt::get(Ty, 1);
3125 }
3126 }
3127
3128 IRBuilder<> IRB(&I);
3129 setShadow(&I,
3130 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
3131 setOrigin(&I, getOrigin(OtherArg));
3132 }
3133
3134 void visitMul(BinaryOperator &I) {
3135 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
3136 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
3137 if (constOp0 && !constOp1)
3138 handleMulByConstant(I, constOp0, I.getOperand(1));
3139 else if (constOp1 && !constOp0)
3140 handleMulByConstant(I, constOp1, I.getOperand(0));
3141 else
3142 handleShadowOr(I);
3143 }
3144
3145 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
3146 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
3147 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
3148 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
3149 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
3150 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
3151
3152 void handleIntegerDiv(Instruction &I) {
3153 IRBuilder<> IRB(&I);
3154 // Strict on the second argument.
3155 insertCheckShadowOf(I.getOperand(1), &I);
3156 setShadow(&I, getShadow(&I, 0));
3157 setOrigin(&I, getOrigin(&I, 0));
3158 }
3159
3160 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3161 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3162 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
3163 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
3164
3165 // Floating point division is side-effect free. We can not require that the
3166 // divisor is fully initialized and must propagate shadow. See PR37523.
3167 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
3168 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
3169
3170 /// Instrument == and != comparisons.
3171 ///
3172 /// Sometimes the comparison result is known even if some of the bits of the
3173 /// arguments are not.
3174 void handleEqualityComparison(ICmpInst &I) {
3175 IRBuilder<> IRB(&I);
3176 Value *A = I.getOperand(0);
3177 Value *B = I.getOperand(1);
3178 Value *Sa = getShadow(A);
3179 Value *Sb = getShadow(B);
3180
3181 Value *Si = propagateEqualityComparison(IRB, A, B, Sa, Sb);
3182
3183 setShadow(&I, Si);
3184 setOriginForNaryOp(I);
3185 }
3186
3187 /// Instrument relational comparisons.
3188 ///
3189 /// This function does exact shadow propagation for all relational
3190 /// comparisons of integers, pointers and vectors of those.
3191 /// FIXME: output seems suboptimal when one of the operands is a constant
3192 void handleRelationalComparisonExact(ICmpInst &I) {
3193 IRBuilder<> IRB(&I);
3194 Value *A = I.getOperand(0);
3195 Value *B = I.getOperand(1);
3196 Value *Sa = getShadow(A);
3197 Value *Sb = getShadow(B);
3198
3199 // Get rid of pointers and vectors of pointers.
3200 // For ints (and vectors of ints), types of A and Sa match,
3201 // and this is a no-op.
3202 A = IRB.CreatePointerCast(A, Sa->getType());
3203 B = IRB.CreatePointerCast(B, Sb->getType());
3204
3205 // Let [a0, a1] be the interval of possible values of A, taking into account
3206 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3207 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3208 bool IsSigned = I.isSigned();
3209
3210 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3211 if (IsSigned) {
3212 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3213 // should be preserved, if checked with `getUnsignedPredicate()`.
3214 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3215 // affected, as they are created by effectively adding/substructing from
3216 // A (or B) a value, derived from shadow, with no overflow, either
3217 // before or after sign flip.
3218 APInt MinVal =
3219 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3220 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3221 }
3222 // Minimize undefined bits.
3223 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3224 Value *Max = IRB.CreateOr(V, S);
3225 return std::make_pair(Min, Max);
3226 };
3227
3228 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3229 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3230 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3231 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3232
3233 Value *Si = IRB.CreateXor(S1, S2);
3234 setShadow(&I, Si);
3235 setOriginForNaryOp(I);
3236 }
3237
3238 /// Instrument signed relational comparisons.
3239 ///
3240 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3241 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3242 void handleSignedRelationalComparison(ICmpInst &I) {
3243 Constant *constOp;
3244 Value *op = nullptr;
3246 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3247 op = I.getOperand(0);
3248 pre = I.getPredicate();
3249 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3250 op = I.getOperand(1);
3251 pre = I.getSwappedPredicate();
3252 } else {
3253 handleShadowOr(I);
3254 return;
3255 }
3256
3257 if ((constOp->isNullValue() &&
3258 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3259 (constOp->isAllOnesValue() &&
3260 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3261 IRBuilder<> IRB(&I);
3262 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3263 "_msprop_icmp_s");
3264 setShadow(&I, Shadow);
3265 setOrigin(&I, getOrigin(op));
3266 } else {
3267 handleShadowOr(I);
3268 }
3269 }
3270
3271 void visitICmpInst(ICmpInst &I) {
3272 if (!ClHandleICmp) {
3273 handleShadowOr(I);
3274 return;
3275 }
3276 if (I.isEquality()) {
3277 handleEqualityComparison(I);
3278 return;
3279 }
3280
3281 assert(I.isRelational());
3282 if (ClHandleICmpExact) {
3283 handleRelationalComparisonExact(I);
3284 return;
3285 }
3286 if (I.isSigned()) {
3287 handleSignedRelationalComparison(I);
3288 return;
3289 }
3290
3291 assert(I.isUnsigned());
3292 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3293 handleRelationalComparisonExact(I);
3294 return;
3295 }
3296
3297 handleShadowOr(I);
3298 }
3299
3300 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3301
3302 void handleShift(BinaryOperator &I) {
3303 IRBuilder<> IRB(&I);
3304 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3305 // Otherwise perform the same shift on S1.
3306 Value *S1 = getShadow(&I, 0);
3307 Value *S2 = getShadow(&I, 1);
3308 Value *S2Conv =
3309 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3310 Value *V2 = I.getOperand(1);
3311 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3312 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3313 setOriginForNaryOp(I);
3314 }
3315
3316 void visitShl(BinaryOperator &I) { handleShift(I); }
3317 void visitAShr(BinaryOperator &I) { handleShift(I); }
3318 void visitLShr(BinaryOperator &I) { handleShift(I); }
3319
3320 void handleFunnelShift(IntrinsicInst &I) {
3321 IRBuilder<> IRB(&I);
3322 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3323 // Otherwise perform the same shift on S0 and S1.
3324 Value *S0 = getShadow(&I, 0);
3325 Value *S1 = getShadow(&I, 1);
3326 Value *S2 = getShadow(&I, 2);
3327 Value *S2Conv =
3328 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3329 Value *V2 = I.getOperand(2);
3330 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3331 {S0, S1, V2});
3332 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3333 setOriginForNaryOp(I);
3334 }
3335
3336 /// Instrument llvm.memmove
3337 ///
3338 /// At this point we don't know if llvm.memmove will be inlined or not.
3339 /// If we don't instrument it and it gets inlined,
3340 /// our interceptor will not kick in and we will lose the memmove.
3341 /// If we instrument the call here, but it does not get inlined,
3342 /// we will memmove the shadow twice: which is bad in case
3343 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3344 ///
3345 /// Similar situation exists for memcpy and memset.
3346 void visitMemMoveInst(MemMoveInst &I) {
3347 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3348 IRBuilder<> IRB(&I);
3349 IRB.CreateCall(MS.MemmoveFn,
3350 {I.getArgOperand(0), I.getArgOperand(1),
3351 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3353 }
3354
3355 /// Instrument memcpy
3356 ///
3357 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3358 /// unfortunate as it may slowdown small constant memcpys.
3359 /// FIXME: consider doing manual inline for small constant sizes and proper
3360 /// alignment.
3361 ///
3362 /// Note: This also handles memcpy.inline, which promises no calls to external
3363 /// functions as an optimization. However, with instrumentation enabled this
3364 /// is difficult to promise; additionally, we know that the MSan runtime
3365 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3366 /// instrumentation it's safe to turn memcpy.inline into a call to
3367 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3368 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3369 void visitMemCpyInst(MemCpyInst &I) {
3370 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3371 IRBuilder<> IRB(&I);
3372 IRB.CreateCall(MS.MemcpyFn,
3373 {I.getArgOperand(0), I.getArgOperand(1),
3374 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3376 }
3377
3378 // Same as memcpy.
3379 void visitMemSetInst(MemSetInst &I) {
3380 IRBuilder<> IRB(&I);
3381 IRB.CreateCall(
3382 MS.MemsetFn,
3383 {I.getArgOperand(0),
3384 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3385 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3387 }
3388
3389 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3390
3391 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3392
3393 /// Handle vector store-like intrinsics.
3394 ///
3395 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3396 /// has 1 pointer argument and 1 vector argument, returns void.
3397 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3398 assert(I.arg_size() == 2);
3399
3400 IRBuilder<> IRB(&I);
3401 Value *Addr = I.getArgOperand(0);
3402 Value *Shadow = getShadow(&I, 1);
3403 Value *ShadowPtr, *OriginPtr;
3404
3405 // We don't know the pointer alignment (could be unaligned SSE store!).
3406 // Have to assume to worst case.
3407 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3408 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3409 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3410
3412 insertCheckShadowOf(Addr, &I);
3413
3414 // FIXME: factor out common code from materializeStores
3415 if (MS.TrackOrigins)
3416 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3417 return true;
3418 }
3419
3420 /// Handle vector load-like intrinsics.
3421 ///
3422 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3423 /// has 1 pointer argument, returns a vector.
3424 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3425 assert(I.arg_size() == 1);
3426
3427 IRBuilder<> IRB(&I);
3428 Value *Addr = I.getArgOperand(0);
3429
3430 Type *ShadowTy = getShadowTy(&I);
3431 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3432 if (PropagateShadow) {
3433 // We don't know the pointer alignment (could be unaligned SSE load!).
3434 // Have to assume to worst case.
3435 const Align Alignment = Align(1);
3436 std::tie(ShadowPtr, OriginPtr) =
3437 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3438 setShadow(&I,
3439 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3440 } else {
3441 setShadow(&I, getCleanShadow(&I));
3442 }
3443
3445 insertCheckShadowOf(Addr, &I);
3446
3447 if (MS.TrackOrigins) {
3448 if (PropagateShadow)
3449 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3450 else
3451 setOrigin(&I, getCleanOrigin());
3452 }
3453 return true;
3454 }
3455
3456 /// Handle (SIMD arithmetic)-like intrinsics.
3457 ///
3458 /// Instrument intrinsics with any number of arguments of the same type [*],
3459 /// equal to the return type, plus a specified number of trailing flags of
3460 /// any type.
3461 ///
3462 /// [*] The type should be simple (no aggregates or pointers; vectors are
3463 /// fine).
3464 ///
3465 /// Caller guarantees that this intrinsic does not access memory.
3466 ///
3467 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3468 /// by this handler. See horizontalReduce().
3469 ///
3470 /// TODO: permutation intrinsics are also often incorrectly matched.
3471 [[maybe_unused]] bool
3472 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3473 unsigned int trailingFlags) {
3474 Type *RetTy = I.getType();
3475 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3476 return false;
3477
3478 unsigned NumArgOperands = I.arg_size();
3479 assert(NumArgOperands >= trailingFlags);
3480 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3481 Type *Ty = I.getArgOperand(i)->getType();
3482 if (Ty != RetTy)
3483 return false;
3484 }
3485
3486 IRBuilder<> IRB(&I);
3487 ShadowAndOriginCombiner SC(this, IRB);
3488 for (unsigned i = 0; i < NumArgOperands; ++i)
3489 SC.Add(I.getArgOperand(i));
3490 SC.Done(&I);
3491
3492 return true;
3493 }
3494
3495 /// Returns whether it was able to heuristically instrument unknown
3496 /// intrinsics.
3497 ///
3498 /// The main purpose of this code is to do something reasonable with all
3499 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3500 /// We recognize several classes of intrinsics by their argument types and
3501 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3502 /// sure that we know what the intrinsic does.
3503 ///
3504 /// We special-case intrinsics where this approach fails. See llvm.bswap
3505 /// handling as an example of that.
3506 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3507 unsigned NumArgOperands = I.arg_size();
3508 if (NumArgOperands == 0)
3509 return false;
3510
3511 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3512 I.getArgOperand(1)->getType()->isVectorTy() &&
3513 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3514 // This looks like a vector store.
3515 return handleVectorStoreIntrinsic(I);
3516 }
3517
3518 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3519 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3520 // This looks like a vector load.
3521 return handleVectorLoadIntrinsic(I);
3522 }
3523
3524 if (I.doesNotAccessMemory())
3525 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3526 return true;
3527
3528 // FIXME: detect and handle SSE maskstore/maskload?
3529 // Some cases are now handled in handleAVXMasked{Load,Store}.
3530 return false;
3531 }
3532
3533 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3534 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3536 dumpInst(I, "Heuristic");
3537
3538 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3539 << "\n");
3540 return true;
3541 } else
3542 return false;
3543 }
3544
3545 void handleInvariantGroup(IntrinsicInst &I) {
3546 setShadow(&I, getShadow(&I, 0));
3547 setOrigin(&I, getOrigin(&I, 0));
3548 }
3549
3550 void handleLifetimeStart(IntrinsicInst &I) {
3551 if (!PoisonStack)
3552 return;
3553 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3554 if (AI)
3555 LifetimeStartList.push_back(std::make_pair(&I, AI));
3556 }
3557
3558 void handleBswap(IntrinsicInst &I) {
3559 IRBuilder<> IRB(&I);
3560 Value *Op = I.getArgOperand(0);
3561 Type *OpType = Op->getType();
3562 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3563 getShadow(Op)));
3564 setOrigin(&I, getOrigin(Op));
3565 }
3566
3567 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3568 // and a 1. If the input is all zero, it is fully initialized iff
3569 // !is_zero_poison.
3570 //
3571 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3572 // concrete value 0/1, and ? is an uninitialized bit:
3573 // - 0001 0??? is fully initialized
3574 // - 000? ???? is fully uninitialized (*)
3575 // - ???? ???? is fully uninitialized
3576 // - 0000 0000 is fully uninitialized if is_zero_poison,
3577 // fully initialized otherwise
3578 //
3579 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3580 // only need to poison 4 bits.
3581 //
3582 // OutputShadow =
3583 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3584 // || (is_zero_poison && AllZeroSrc)
3585 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3586 IRBuilder<> IRB(&I);
3587 Value *Src = I.getArgOperand(0);
3588 Value *SrcShadow = getShadow(Src);
3589
3590 Value *False = IRB.getInt1(false);
3591 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3592 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3593 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3594 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3595
3596 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3597 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3598
3599 Value *NotAllZeroShadow =
3600 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3601 Value *OutputShadow =
3602 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3603
3604 // If zero poison is requested, mix in with the shadow
3605 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3606 if (!IsZeroPoison->isNullValue()) {
3607 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3608 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3609 }
3610
3611 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3612
3613 setShadow(&I, OutputShadow);
3614 setOriginForNaryOp(I);
3615 }
3616
3617 /// Some instructions have additional zero-elements in the return type
3618 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3619 ///
3620 /// This function will return a vector type with the same number of elements
3621 /// as the input, but same per-element width as the return value e.g.,
3622 /// <8 x i8>.
3623 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3624 assert(isa<FixedVectorType>(getShadowTy(&I)));
3625 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3626
3627 // TODO: generalize beyond 2x?
3628 if (ShadowType->getElementCount() ==
3629 cast<VectorType>(Src->getType())->getElementCount() * 2)
3630 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3631
3632 assert(ShadowType->getElementCount() ==
3633 cast<VectorType>(Src->getType())->getElementCount());
3634
3635 return ShadowType;
3636 }
3637
3638 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3639 /// to match the length of the shadow for the instruction.
3640 /// If scalar types of the vectors are different, it will use the type of the
3641 /// input vector.
3642 /// This is more type-safe than CreateShadowCast().
3643 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3644 IRBuilder<> IRB(&I);
3646 assert(isa<FixedVectorType>(I.getType()));
3647
3648 Value *FullShadow = getCleanShadow(&I);
3649 unsigned ShadowNumElems =
3650 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3651 unsigned FullShadowNumElems =
3652 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3653
3654 assert((ShadowNumElems == FullShadowNumElems) ||
3655 (ShadowNumElems * 2 == FullShadowNumElems));
3656
3657 if (ShadowNumElems == FullShadowNumElems) {
3658 FullShadow = Shadow;
3659 } else {
3660 // TODO: generalize beyond 2x?
3661 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3662 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3663
3664 // Append zeros
3665 FullShadow =
3666 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3667 }
3668
3669 return FullShadow;
3670 }
3671
3672 /// Handle x86 SSE vector conversion.
3673 ///
3674 /// e.g., single-precision to half-precision conversion:
3675 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3676 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3677 ///
3678 /// floating-point to integer:
3679 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3680 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3681 ///
3682 /// Note: if the output has more elements, they are zero-initialized (and
3683 /// therefore the shadow will also be initialized).
3684 ///
3685 /// This differs from handleSSEVectorConvertIntrinsic() because it
3686 /// propagates uninitialized shadow (instead of checking the shadow).
3687 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3688 bool HasRoundingMode) {
3689 if (HasRoundingMode) {
3690 assert(I.arg_size() == 2);
3691 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3692 assert(RoundingMode->getType()->isIntegerTy());
3693 } else {
3694 assert(I.arg_size() == 1);
3695 }
3696
3697 Value *Src = I.getArgOperand(0);
3698 assert(Src->getType()->isVectorTy());
3699
3700 // The return type might have more elements than the input.
3701 // Temporarily shrink the return type's number of elements.
3702 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3703
3704 IRBuilder<> IRB(&I);
3705 Value *S0 = getShadow(&I, 0);
3706
3707 /// For scalars:
3708 /// Since they are converting to and/or from floating-point, the output is:
3709 /// - fully uninitialized if *any* bit of the input is uninitialized
3710 /// - fully ininitialized if all bits of the input are ininitialized
3711 /// We apply the same principle on a per-field basis for vectors.
3712 Value *Shadow =
3713 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3714
3715 // The return type might have more elements than the input.
3716 // Extend the return type back to its original width if necessary.
3717 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3718
3719 setShadow(&I, FullShadow);
3720 setOriginForNaryOp(I);
3721 }
3722
3723 // Instrument x86 SSE vector convert intrinsic.
3724 //
3725 // This function instruments intrinsics like cvtsi2ss:
3726 // %Out = int_xxx_cvtyyy(%ConvertOp)
3727 // or
3728 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3729 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3730 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3731 // elements from \p CopyOp.
3732 // In most cases conversion involves floating-point value which may trigger a
3733 // hardware exception when not fully initialized. For this reason we require
3734 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3735 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3736 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3737 // return a fully initialized value.
3738 //
3739 // For Arm NEON vector convert intrinsics, see
3740 // handleNEONVectorConvertIntrinsic().
3741 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3742 bool HasRoundingMode = false) {
3743 IRBuilder<> IRB(&I);
3744 Value *CopyOp, *ConvertOp;
3745
3746 assert((!HasRoundingMode ||
3747 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3748 "Invalid rounding mode");
3749
3750 switch (I.arg_size() - HasRoundingMode) {
3751 case 2:
3752 CopyOp = I.getArgOperand(0);
3753 ConvertOp = I.getArgOperand(1);
3754 break;
3755 case 1:
3756 ConvertOp = I.getArgOperand(0);
3757 CopyOp = nullptr;
3758 break;
3759 default:
3760 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3761 }
3762
3763 // The first *NumUsedElements* elements of ConvertOp are converted to the
3764 // same number of output elements. The rest of the output is copied from
3765 // CopyOp, or (if not available) filled with zeroes.
3766 // Combine shadow for elements of ConvertOp that are used in this operation,
3767 // and insert a check.
3768 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3769 // int->any conversion.
3770 Value *ConvertShadow = getShadow(ConvertOp);
3771 Value *AggShadow = nullptr;
3772 if (ConvertOp->getType()->isVectorTy()) {
3773 AggShadow = IRB.CreateExtractElement(
3774 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3775 for (int i = 1; i < NumUsedElements; ++i) {
3776 Value *MoreShadow = IRB.CreateExtractElement(
3777 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3778 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3779 }
3780 } else {
3781 AggShadow = ConvertShadow;
3782 }
3783 assert(AggShadow->getType()->isIntegerTy());
3784 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3785
3786 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3787 // ConvertOp.
3788 if (CopyOp) {
3789 assert(CopyOp->getType() == I.getType());
3790 assert(CopyOp->getType()->isVectorTy());
3791 Value *ResultShadow = getShadow(CopyOp);
3792 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3793 for (int i = 0; i < NumUsedElements; ++i) {
3794 ResultShadow = IRB.CreateInsertElement(
3795 ResultShadow, ConstantInt::getNullValue(EltTy),
3796 ConstantInt::get(IRB.getInt32Ty(), i));
3797 }
3798 setShadow(&I, ResultShadow);
3799 setOrigin(&I, getOrigin(CopyOp));
3800 } else {
3801 setShadow(&I, getCleanShadow(&I));
3802 setOrigin(&I, getCleanOrigin());
3803 }
3804 }
3805
3806 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3807 // zeroes if it is zero, and all ones otherwise.
3808 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3809 if (S->getType()->isVectorTy())
3810 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3811 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3812 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3813 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3814 }
3815
3816 // Given a vector, extract its first element, and return all
3817 // zeroes if it is zero, and all ones otherwise.
3818 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3819 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3820 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3821 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3822 }
3823
3824 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3825 Type *T = S->getType();
3826 assert(T->isVectorTy());
3827 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3828 return IRB.CreateSExt(S2, T);
3829 }
3830
3831 // Instrument vector shift intrinsic.
3832 //
3833 // This function instruments intrinsics like int_x86_avx2_psll_w.
3834 // Intrinsic shifts %In by %ShiftSize bits.
3835 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3836 // size, and the rest is ignored. Behavior is defined even if shift size is
3837 // greater than register (or field) width.
3838 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3839 assert(I.arg_size() == 2);
3840 IRBuilder<> IRB(&I);
3841 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3842 // Otherwise perform the same shift on S1.
3843 Value *S1 = getShadow(&I, 0);
3844 Value *S2 = getShadow(&I, 1);
3845 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3846 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3847 Value *V1 = I.getOperand(0);
3848 Value *V2 = I.getOperand(1);
3849 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3850 {IRB.CreateBitCast(S1, V1->getType()), V2});
3851 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3852 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3853 setOriginForNaryOp(I);
3854 }
3855
3856 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3857 // vectors.
3858 Type *getMMXVectorTy(unsigned EltSizeInBits,
3859 unsigned X86_MMXSizeInBits = 64) {
3860 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3861 "Illegal MMX vector element size");
3862 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3863 X86_MMXSizeInBits / EltSizeInBits);
3864 }
3865
3866 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3867 // intrinsic.
3868 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3869 switch (id) {
3870 case Intrinsic::x86_sse2_packsswb_128:
3871 case Intrinsic::x86_sse2_packuswb_128:
3872 return Intrinsic::x86_sse2_packsswb_128;
3873
3874 case Intrinsic::x86_sse2_packssdw_128:
3875 case Intrinsic::x86_sse41_packusdw:
3876 return Intrinsic::x86_sse2_packssdw_128;
3877
3878 case Intrinsic::x86_avx2_packsswb:
3879 case Intrinsic::x86_avx2_packuswb:
3880 return Intrinsic::x86_avx2_packsswb;
3881
3882 case Intrinsic::x86_avx2_packssdw:
3883 case Intrinsic::x86_avx2_packusdw:
3884 return Intrinsic::x86_avx2_packssdw;
3885
3886 case Intrinsic::x86_mmx_packsswb:
3887 case Intrinsic::x86_mmx_packuswb:
3888 return Intrinsic::x86_mmx_packsswb;
3889
3890 case Intrinsic::x86_mmx_packssdw:
3891 return Intrinsic::x86_mmx_packssdw;
3892
3893 case Intrinsic::x86_avx512_packssdw_512:
3894 case Intrinsic::x86_avx512_packusdw_512:
3895 return Intrinsic::x86_avx512_packssdw_512;
3896
3897 case Intrinsic::x86_avx512_packsswb_512:
3898 case Intrinsic::x86_avx512_packuswb_512:
3899 return Intrinsic::x86_avx512_packsswb_512;
3900
3901 default:
3902 llvm_unreachable("unexpected intrinsic id");
3903 }
3904 }
3905
3906 // Instrument vector pack intrinsic.
3907 //
3908 // This function instruments intrinsics like x86_mmx_packsswb, that
3909 // packs elements of 2 input vectors into half as many bits with saturation.
3910 // Shadow is propagated with the signed variant of the same intrinsic applied
3911 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3912 // MMXEltSizeInBits is used only for x86mmx arguments.
3913 //
3914 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3915 void handleVectorPackIntrinsic(IntrinsicInst &I,
3916 unsigned MMXEltSizeInBits = 0) {
3917 assert(I.arg_size() == 2);
3918 IRBuilder<> IRB(&I);
3919 Value *S1 = getShadow(&I, 0);
3920 Value *S2 = getShadow(&I, 1);
3921 assert(S1->getType()->isVectorTy());
3922
3923 // SExt and ICmpNE below must apply to individual elements of input vectors.
3924 // In case of x86mmx arguments, cast them to appropriate vector types and
3925 // back.
3926 Type *T =
3927 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3928 if (MMXEltSizeInBits) {
3929 S1 = IRB.CreateBitCast(S1, T);
3930 S2 = IRB.CreateBitCast(S2, T);
3931 }
3932 Value *S1_ext =
3934 Value *S2_ext =
3936 if (MMXEltSizeInBits) {
3937 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3938 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3939 }
3940
3941 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3942 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3943 "_msprop_vector_pack");
3944 if (MMXEltSizeInBits)
3945 S = IRB.CreateBitCast(S, getShadowTy(&I));
3946 setShadow(&I, S);
3947 setOriginForNaryOp(I);
3948 }
3949
3950 // Convert `Mask` into `<n x i1>`.
3951 Constant *createDppMask(unsigned Width, unsigned Mask) {
3952 SmallVector<Constant *, 4> R(Width);
3953 for (auto &M : R) {
3954 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3955 Mask >>= 1;
3956 }
3957 return ConstantVector::get(R);
3958 }
3959
3960 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3961 // arg is poisoned, entire dot product is poisoned.
3962 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3963 unsigned DstMask) {
3964 const unsigned Width =
3965 cast<FixedVectorType>(S->getType())->getNumElements();
3966
3967 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3969 Value *SElem = IRB.CreateOrReduce(S);
3970 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3971 Value *DstMaskV = createDppMask(Width, DstMask);
3972
3973 return IRB.CreateSelect(
3974 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3975 }
3976
3977 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3978 //
3979 // 2 and 4 element versions produce single scalar of dot product, and then
3980 // puts it into elements of output vector, selected by 4 lowest bits of the
3981 // mask. Top 4 bits of the mask control which elements of input to use for dot
3982 // product.
3983 //
3984 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3985 // mask. According to the spec it just operates as 4 element version on first
3986 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3987 // output.
3988 void handleDppIntrinsic(IntrinsicInst &I) {
3989 IRBuilder<> IRB(&I);
3990
3991 Value *S0 = getShadow(&I, 0);
3992 Value *S1 = getShadow(&I, 1);
3993 Value *S = IRB.CreateOr(S0, S1);
3994
3995 const unsigned Width =
3996 cast<FixedVectorType>(S->getType())->getNumElements();
3997 assert(Width == 2 || Width == 4 || Width == 8);
3998
3999 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4000 const unsigned SrcMask = Mask >> 4;
4001 const unsigned DstMask = Mask & 0xf;
4002
4003 // Calculate shadow as `<n x i1>`.
4004 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
4005 if (Width == 8) {
4006 // First 4 elements of shadow are already calculated. `makeDppShadow`
4007 // operats on 32 bit masks, so we can just shift masks, and repeat.
4008 SI1 = IRB.CreateOr(
4009 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
4010 }
4011 // Extend to real size of shadow, poisoning either all or none bits of an
4012 // element.
4013 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
4014
4015 setShadow(&I, S);
4016 setOriginForNaryOp(I);
4017 }
4018
4019 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
4020 C = CreateAppToShadowCast(IRB, C);
4021 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
4022 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
4023 C = IRB.CreateAShr(C, ElSize - 1);
4024 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
4025 return IRB.CreateTrunc(C, FVT);
4026 }
4027
4028 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
4029 void handleBlendvIntrinsic(IntrinsicInst &I) {
4030 Value *C = I.getOperand(2);
4031 Value *T = I.getOperand(1);
4032 Value *F = I.getOperand(0);
4033
4034 Value *Sc = getShadow(&I, 2);
4035 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
4036
4037 {
4038 IRBuilder<> IRB(&I);
4039 // Extract top bit from condition and its shadow.
4040 C = convertBlendvToSelectMask(IRB, C);
4041 Sc = convertBlendvToSelectMask(IRB, Sc);
4042
4043 setShadow(C, Sc);
4044 setOrigin(C, Oc);
4045 }
4046
4047 handleSelectLikeInst(I, C, T, F);
4048 }
4049
4050 // Instrument sum-of-absolute-differences intrinsic.
4051 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
4052 const unsigned SignificantBitsPerResultElement = 16;
4053 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
4054 unsigned ZeroBitsPerResultElement =
4055 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
4056
4057 IRBuilder<> IRB(&I);
4058 auto *Shadow0 = getShadow(&I, 0);
4059 auto *Shadow1 = getShadow(&I, 1);
4060 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4061 S = IRB.CreateBitCast(S, ResTy);
4062 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
4063 ResTy);
4064 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
4065 S = IRB.CreateBitCast(S, getShadowTy(&I));
4066 setShadow(&I, S);
4067 setOriginForNaryOp(I);
4068 }
4069
4070 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
4071 //
4072 // e.g., Two operands:
4073 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
4074 //
4075 // Two operands which require an EltSizeInBits override:
4076 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
4077 //
4078 // Three operands:
4079 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
4080 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
4081 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
4082 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
4083 // (these are equivalent to multiply-add on %a and %b, followed by
4084 // adding/"accumulating" %s. "Accumulation" stores the result in one
4085 // of the source registers, but this accumulate vs. add distinction
4086 // is lost when dealing with LLVM intrinsics.)
4087 //
4088 // ZeroPurifies means that multiplying a known-zero with an uninitialized
4089 // value results in an initialized value. This is applicable for integer
4090 // multiplication, but not floating-point (counter-example: NaN).
4091 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
4092 unsigned ReductionFactor,
4093 bool ZeroPurifies,
4094 unsigned EltSizeInBits,
4095 enum OddOrEvenLanes Lanes) {
4096 IRBuilder<> IRB(&I);
4097
4098 [[maybe_unused]] FixedVectorType *ReturnType =
4099 cast<FixedVectorType>(I.getType());
4100 assert(isa<FixedVectorType>(ReturnType));
4101
4102 // Vectors A and B, and shadows
4103 Value *Va = nullptr;
4104 Value *Vb = nullptr;
4105 Value *Sa = nullptr;
4106 Value *Sb = nullptr;
4107
4108 assert(I.arg_size() == 2 || I.arg_size() == 3);
4109 if (I.arg_size() == 2) {
4110 assert(Lanes == kBothLanes);
4111
4112 Va = I.getOperand(0);
4113 Vb = I.getOperand(1);
4114
4115 Sa = getShadow(&I, 0);
4116 Sb = getShadow(&I, 1);
4117 } else if (I.arg_size() == 3) {
4118 // Operand 0 is the accumulator. We will deal with that below.
4119 Va = I.getOperand(1);
4120 Vb = I.getOperand(2);
4121
4122 Sa = getShadow(&I, 1);
4123 Sb = getShadow(&I, 2);
4124
4125 if (Lanes == kEvenLanes || Lanes == kOddLanes) {
4126 // Convert < S0, S1, S2, S3, S4, S5, S6, S7 >
4127 // to < S0, S0, S2, S2, S4, S4, S6, S6 > (if even)
4128 // to < S1, S1, S3, S3, S5, S5, S7, S7 > (if odd)
4129 //
4130 // Note: for aarch64.neon.bfmlalb/t, the odd/even-indexed values are
4131 // zeroed, not duplicated. However, for shadow propagation, this
4132 // distinction is unimportant because Step 1 below will squeeze
4133 // each pair of elements (e.g., [S0, S0]) into a single bit, and
4134 // we only care if it is fully initialized.
4135
4136 FixedVectorType *InputShadowType = cast<FixedVectorType>(Sa->getType());
4137 unsigned Width = InputShadowType->getNumElements();
4138
4139 Sa = IRB.CreateShuffleVector(
4140 Sa, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4141 Sb = IRB.CreateShuffleVector(
4142 Sb, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4143 }
4144 }
4145
4146 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
4147 assert(ParamType == Vb->getType());
4148
4149 assert(ParamType->getPrimitiveSizeInBits() ==
4150 ReturnType->getPrimitiveSizeInBits());
4151
4152 if (I.arg_size() == 3) {
4153 [[maybe_unused]] auto *AccumulatorType =
4154 cast<FixedVectorType>(I.getOperand(0)->getType());
4155 assert(AccumulatorType == ReturnType);
4156 }
4157
4158 FixedVectorType *ImplicitReturnType =
4159 cast<FixedVectorType>(getShadowTy(ReturnType));
4160 // Step 1: instrument multiplication of corresponding vector elements
4161 if (EltSizeInBits) {
4162 ImplicitReturnType = cast<FixedVectorType>(
4163 getMMXVectorTy(EltSizeInBits * ReductionFactor,
4164 ParamType->getPrimitiveSizeInBits()));
4165 ParamType = cast<FixedVectorType>(
4166 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
4167
4168 Va = IRB.CreateBitCast(Va, ParamType);
4169 Vb = IRB.CreateBitCast(Vb, ParamType);
4170
4171 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
4172 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
4173 } else {
4174 assert(ParamType->getNumElements() ==
4175 ReturnType->getNumElements() * ReductionFactor);
4176 }
4177
4178 // Each element of the vector is represented by a single bit (poisoned or
4179 // not) e.g., <8 x i1>.
4180 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
4181 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
4182 Value *And;
4183 if (ZeroPurifies) {
4184 // Multiplying an *initialized* zero by an uninitialized element results
4185 // in an initialized zero element.
4186 //
4187 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4188 // results in an unpoisoned value.
4189 Value *VaInt = Va;
4190 Value *VbInt = Vb;
4191 if (!Va->getType()->isIntegerTy()) {
4192 VaInt = CreateAppToShadowCast(IRB, Va);
4193 VbInt = CreateAppToShadowCast(IRB, Vb);
4194 }
4195
4196 // We check for non-zero on a per-element basis, not per-bit.
4197 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4198 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4199
4200 And = handleBitwiseAnd(IRB, VaNonZero, VbNonZero, SaNonZero, SbNonZero);
4201 } else {
4202 And = IRB.CreateOr({SaNonZero, SbNonZero});
4203 }
4204
4205 // Extend <8 x i1> to <8 x i16>.
4206 // (The real pmadd intrinsic would have computed intermediate values of
4207 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4208 // consider each element to be either fully initialized or fully
4209 // uninitialized.)
4210 And = IRB.CreateSExt(And, Sa->getType());
4211
4212 // Step 2: instrument horizontal add
4213 // We don't need bit-precise horizontalReduce because we only want to check
4214 // if each pair/quad of elements is fully zero.
4215 // Cast to <4 x i32>.
4216 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4217
4218 // Compute <4 x i1>, then extend back to <4 x i32>.
4219 Value *OutShadow = IRB.CreateSExt(
4220 IRB.CreateICmpNE(Horizontal,
4221 Constant::getNullValue(Horizontal->getType())),
4222 ImplicitReturnType);
4223
4224 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4225 // AVX, it is already correct).
4226 if (EltSizeInBits)
4227 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4228
4229 // Step 3 (if applicable): instrument accumulator
4230 if (I.arg_size() == 3)
4231 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4232
4233 setShadow(&I, OutShadow);
4234 setOriginForNaryOp(I);
4235 }
4236
4237 // Instrument compare-packed intrinsic.
4238 //
4239 // x86 has the predicate as the third operand, which is ImmArg e.g.,
4240 // - <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double>, <4 x double>, i8)
4241 // - <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double>, <2 x double>, i8)
4242 //
4243 // while Arm has separate intrinsics for >= and > e.g.,
4244 // - <2 x i32> @llvm.aarch64.neon.facge.v2i32.v2f32
4245 // (<2 x float> %A, <2 x float>)
4246 // - <2 x i32> @llvm.aarch64.neon.facgt.v2i32.v2f32
4247 // (<2 x float> %A, <2 x float>)
4248 //
4249 // Bonus: this also handles scalar cases e.g.,
4250 // - i32 @llvm.aarch64.neon.facgt.i32.f32(float %A, float %B)
4251 void handleVectorComparePackedIntrinsic(IntrinsicInst &I,
4252 bool PredicateAsOperand) {
4253 if (PredicateAsOperand) {
4254 assert(I.arg_size() == 3);
4255 assert(I.paramHasAttr(2, Attribute::ImmArg));
4256 } else
4257 assert(I.arg_size() == 2);
4258
4259 IRBuilder<> IRB(&I);
4260
4261 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4262 // all-ones shadow.
4263 Type *ResTy = getShadowTy(&I);
4264 auto *Shadow0 = getShadow(&I, 0);
4265 auto *Shadow1 = getShadow(&I, 1);
4266 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4267 Value *S = IRB.CreateSExt(
4268 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4269 setShadow(&I, S);
4270 setOriginForNaryOp(I);
4271 }
4272
4273 // Instrument compare-scalar intrinsic.
4274 // This handles both cmp* intrinsics which return the result in the first
4275 // element of a vector, and comi* which return the result as i32.
4276 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4277 IRBuilder<> IRB(&I);
4278 auto *Shadow0 = getShadow(&I, 0);
4279 auto *Shadow1 = getShadow(&I, 1);
4280 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4281 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4282 setShadow(&I, S);
4283 setOriginForNaryOp(I);
4284 }
4285
4286 // Instrument generic vector reduction intrinsics
4287 // by ORing together all their fields.
4288 //
4289 // If AllowShadowCast is true, the return type does not need to be the same
4290 // type as the fields
4291 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4292 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4293 assert(I.arg_size() == 1);
4294
4295 IRBuilder<> IRB(&I);
4296 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4297 if (AllowShadowCast)
4298 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4299 else
4300 assert(S->getType() == getShadowTy(&I));
4301 setShadow(&I, S);
4302 setOriginForNaryOp(I);
4303 }
4304
4305 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4306 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4307 // %a1)
4308 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4309 //
4310 // The type of the return value, initial starting value, and elements of the
4311 // vector must be identical.
4312 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4313 assert(I.arg_size() == 2);
4314
4315 IRBuilder<> IRB(&I);
4316 Value *Shadow0 = getShadow(&I, 0);
4317 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4318 assert(Shadow0->getType() == Shadow1->getType());
4319 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4320 assert(S->getType() == getShadowTy(&I));
4321 setShadow(&I, S);
4322 setOriginForNaryOp(I);
4323 }
4324
4325 // Instrument vector.reduce.or intrinsic.
4326 // Valid (non-poisoned) set bits in the operand pull low the
4327 // corresponding shadow bits.
4328 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4329 assert(I.arg_size() == 1);
4330
4331 IRBuilder<> IRB(&I);
4332 Value *OperandShadow = getShadow(&I, 0);
4333 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4334 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4335 // Bit N is clean if any field's bit N is 1 and unpoison
4336 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4337 // Otherwise, it is clean if every field's bit N is unpoison
4338 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4339 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4340
4341 setShadow(&I, S);
4342 setOrigin(&I, getOrigin(&I, 0));
4343 }
4344
4345 // Instrument vector.reduce.and intrinsic.
4346 // Valid (non-poisoned) unset bits in the operand pull down the
4347 // corresponding shadow bits.
4348 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4349 assert(I.arg_size() == 1);
4350
4351 IRBuilder<> IRB(&I);
4352 Value *OperandShadow = getShadow(&I, 0);
4353 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4354 // Bit N is clean if any field's bit N is 0 and unpoison
4355 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4356 // Otherwise, it is clean if every field's bit N is unpoison
4357 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4358 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4359
4360 setShadow(&I, S);
4361 setOrigin(&I, getOrigin(&I, 0));
4362 }
4363
4364 void handleStmxcsr(IntrinsicInst &I) {
4365 IRBuilder<> IRB(&I);
4366 Value *Addr = I.getArgOperand(0);
4367 Type *Ty = IRB.getInt32Ty();
4368 Value *ShadowPtr =
4369 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4370
4371 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4372
4374 insertCheckShadowOf(Addr, &I);
4375 }
4376
4377 void handleLdmxcsr(IntrinsicInst &I) {
4378 if (!InsertChecks)
4379 return;
4380
4381 IRBuilder<> IRB(&I);
4382 Value *Addr = I.getArgOperand(0);
4383 Type *Ty = IRB.getInt32Ty();
4384 const Align Alignment = Align(1);
4385 Value *ShadowPtr, *OriginPtr;
4386 std::tie(ShadowPtr, OriginPtr) =
4387 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4388
4390 insertCheckShadowOf(Addr, &I);
4391
4392 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4393 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4394 : getCleanOrigin();
4395 insertCheckShadow(Shadow, Origin, &I);
4396 }
4397
4398 void handleMaskedExpandLoad(IntrinsicInst &I) {
4399 IRBuilder<> IRB(&I);
4400 Value *Ptr = I.getArgOperand(0);
4401 MaybeAlign Align = I.getParamAlign(0);
4402 Value *Mask = I.getArgOperand(1);
4403 Value *PassThru = I.getArgOperand(2);
4404
4406 insertCheckShadowOf(Ptr, &I);
4407 insertCheckShadowOf(Mask, &I);
4408 }
4409
4410 if (!PropagateShadow) {
4411 setShadow(&I, getCleanShadow(&I));
4412 setOrigin(&I, getCleanOrigin());
4413 return;
4414 }
4415
4416 Type *ShadowTy = getShadowTy(&I);
4417 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4418 auto [ShadowPtr, OriginPtr] =
4419 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4420
4421 Value *Shadow =
4422 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4423 getShadow(PassThru), "_msmaskedexpload");
4424
4425 setShadow(&I, Shadow);
4426
4427 // TODO: Store origins.
4428 setOrigin(&I, getCleanOrigin());
4429 }
4430
4431 void handleMaskedCompressStore(IntrinsicInst &I) {
4432 IRBuilder<> IRB(&I);
4433 Value *Values = I.getArgOperand(0);
4434 Value *Ptr = I.getArgOperand(1);
4435 MaybeAlign Align = I.getParamAlign(1);
4436 Value *Mask = I.getArgOperand(2);
4437
4439 insertCheckShadowOf(Ptr, &I);
4440 insertCheckShadowOf(Mask, &I);
4441 }
4442
4443 Value *Shadow = getShadow(Values);
4444 Type *ElementShadowTy =
4445 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4446 auto [ShadowPtr, OriginPtrs] =
4447 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4448
4449 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4450
4451 // TODO: Store origins.
4452 }
4453
4454 void handleMaskedGather(IntrinsicInst &I) {
4455 IRBuilder<> IRB(&I);
4456 Value *Ptrs = I.getArgOperand(0);
4457 const Align Alignment = I.getParamAlign(0).valueOrOne();
4458 Value *Mask = I.getArgOperand(1);
4459 Value *PassThru = I.getArgOperand(2);
4460
4461 Type *PtrsShadowTy = getShadowTy(Ptrs);
4463 insertCheckShadowOf(Mask, &I);
4464 Value *MaskedPtrShadow = IRB.CreateSelect(
4465 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4466 "_msmaskedptrs");
4467 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4468 }
4469
4470 if (!PropagateShadow) {
4471 setShadow(&I, getCleanShadow(&I));
4472 setOrigin(&I, getCleanOrigin());
4473 return;
4474 }
4475
4476 Type *ShadowTy = getShadowTy(&I);
4477 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4478 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4479 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4480
4481 Value *Shadow =
4482 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4483 getShadow(PassThru), "_msmaskedgather");
4484
4485 setShadow(&I, Shadow);
4486
4487 // TODO: Store origins.
4488 setOrigin(&I, getCleanOrigin());
4489 }
4490
4491 void handleMaskedScatter(IntrinsicInst &I) {
4492 IRBuilder<> IRB(&I);
4493 Value *Values = I.getArgOperand(0);
4494 Value *Ptrs = I.getArgOperand(1);
4495 const Align Alignment = I.getParamAlign(1).valueOrOne();
4496 Value *Mask = I.getArgOperand(2);
4497
4498 Type *PtrsShadowTy = getShadowTy(Ptrs);
4500 insertCheckShadowOf(Mask, &I);
4501 Value *MaskedPtrShadow = IRB.CreateSelect(
4502 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4503 "_msmaskedptrs");
4504 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4505 }
4506
4507 Value *Shadow = getShadow(Values);
4508 Type *ElementShadowTy =
4509 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4510 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4511 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4512
4513 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4514
4515 // TODO: Store origin.
4516 }
4517
4518 // Intrinsic::masked_store
4519 //
4520 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4521 // stores are lowered to Intrinsic::masked_store.
4522 void handleMaskedStore(IntrinsicInst &I) {
4523 IRBuilder<> IRB(&I);
4524 Value *V = I.getArgOperand(0);
4525 Value *Ptr = I.getArgOperand(1);
4526 const Align Alignment = I.getParamAlign(1).valueOrOne();
4527 Value *Mask = I.getArgOperand(2);
4528 Value *Shadow = getShadow(V);
4529
4531 insertCheckShadowOf(Ptr, &I);
4532 insertCheckShadowOf(Mask, &I);
4533 }
4534
4535 Value *ShadowPtr;
4536 Value *OriginPtr;
4537 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4538 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4539
4540 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4541
4542 if (!MS.TrackOrigins)
4543 return;
4544
4545 auto &DL = F.getDataLayout();
4546 paintOrigin(IRB, getOrigin(V), OriginPtr,
4547 DL.getTypeStoreSize(Shadow->getType()),
4548 std::max(Alignment, kMinOriginAlignment));
4549 }
4550
4551 // Intrinsic::masked_load
4552 //
4553 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4554 // loads are lowered to Intrinsic::masked_load.
4555 void handleMaskedLoad(IntrinsicInst &I) {
4556 IRBuilder<> IRB(&I);
4557 Value *Ptr = I.getArgOperand(0);
4558 const Align Alignment = I.getParamAlign(0).valueOrOne();
4559 Value *Mask = I.getArgOperand(1);
4560 Value *PassThru = I.getArgOperand(2);
4561
4563 insertCheckShadowOf(Ptr, &I);
4564 insertCheckShadowOf(Mask, &I);
4565 }
4566
4567 if (!PropagateShadow) {
4568 setShadow(&I, getCleanShadow(&I));
4569 setOrigin(&I, getCleanOrigin());
4570 return;
4571 }
4572
4573 Type *ShadowTy = getShadowTy(&I);
4574 Value *ShadowPtr, *OriginPtr;
4575 std::tie(ShadowPtr, OriginPtr) =
4576 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4577 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4578 getShadow(PassThru), "_msmaskedld"));
4579
4580 if (!MS.TrackOrigins)
4581 return;
4582
4583 // Choose between PassThru's and the loaded value's origins.
4584 Value *MaskedPassThruShadow = IRB.CreateAnd(
4585 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4586
4587 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4588
4589 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4590 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4591
4592 setOrigin(&I, Origin);
4593 }
4594
4595 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4596 // dst mask src
4597 //
4598 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4599 // by handleMaskedStore.
4600 //
4601 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4602 // vector of integers, unlike the LLVM masked intrinsics, which require a
4603 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4604 // mentions that the x86 backend does not know how to efficiently convert
4605 // from a vector of booleans back into the AVX mask format; therefore, they
4606 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4607 // intrinsics.
4608 void handleAVXMaskedStore(IntrinsicInst &I) {
4609 assert(I.arg_size() == 3);
4610
4611 IRBuilder<> IRB(&I);
4612
4613 Value *Dst = I.getArgOperand(0);
4614 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4615
4616 Value *Mask = I.getArgOperand(1);
4617 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4618
4619 Value *Src = I.getArgOperand(2);
4620 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4621
4622 const Align Alignment = Align(1);
4623
4624 Value *SrcShadow = getShadow(Src);
4625
4627 insertCheckShadowOf(Dst, &I);
4628 insertCheckShadowOf(Mask, &I);
4629 }
4630
4631 Value *DstShadowPtr;
4632 Value *DstOriginPtr;
4633 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4634 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4635
4636 SmallVector<Value *, 2> ShadowArgs;
4637 ShadowArgs.append(1, DstShadowPtr);
4638 ShadowArgs.append(1, Mask);
4639 // The intrinsic may require floating-point but shadows can be arbitrary
4640 // bit patterns, of which some would be interpreted as "invalid"
4641 // floating-point values (NaN etc.); we assume the intrinsic will happily
4642 // copy them.
4643 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4644
4645 CallInst *CI =
4646 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4647 setShadow(&I, CI);
4648
4649 if (!MS.TrackOrigins)
4650 return;
4651
4652 // Approximation only
4653 auto &DL = F.getDataLayout();
4654 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4655 DL.getTypeStoreSize(SrcShadow->getType()),
4656 std::max(Alignment, kMinOriginAlignment));
4657 }
4658
4659 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4660 // return src mask
4661 //
4662 // Masked-off values are replaced with 0, which conveniently also represents
4663 // initialized memory.
4664 //
4665 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4666 // by handleMaskedStore.
4667 //
4668 // We do not combine this with handleMaskedLoad; see comment in
4669 // handleAVXMaskedStore for the rationale.
4670 //
4671 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4672 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4673 // parameter.
4674 void handleAVXMaskedLoad(IntrinsicInst &I) {
4675 assert(I.arg_size() == 2);
4676
4677 IRBuilder<> IRB(&I);
4678
4679 Value *Src = I.getArgOperand(0);
4680 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4681
4682 Value *Mask = I.getArgOperand(1);
4683 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4684
4685 const Align Alignment = Align(1);
4686
4688 insertCheckShadowOf(Mask, &I);
4689 }
4690
4691 Type *SrcShadowTy = getShadowTy(Src);
4692 Value *SrcShadowPtr, *SrcOriginPtr;
4693 std::tie(SrcShadowPtr, SrcOriginPtr) =
4694 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4695
4696 SmallVector<Value *, 2> ShadowArgs;
4697 ShadowArgs.append(1, SrcShadowPtr);
4698 ShadowArgs.append(1, Mask);
4699
4700 CallInst *CI =
4701 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4702 // The AVX masked load intrinsics do not have integer variants. We use the
4703 // floating-point variants, which will happily copy the shadows even if
4704 // they are interpreted as "invalid" floating-point values (NaN etc.).
4705 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4706
4707 if (!MS.TrackOrigins)
4708 return;
4709
4710 // The "pass-through" value is always zero (initialized). To the extent
4711 // that that results in initialized aligned 4-byte chunks, the origin value
4712 // is ignored. It is therefore correct to simply copy the origin from src.
4713 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4714 setOrigin(&I, PtrSrcOrigin);
4715 }
4716
4717 // Test whether the mask indices are initialized, only checking the bits that
4718 // are actually used.
4719 //
4720 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4721 // used/checked.
4722 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4723 assert(isFixedIntVector(Idx));
4724 auto IdxVectorSize =
4725 cast<FixedVectorType>(Idx->getType())->getNumElements();
4726 assert(isPowerOf2_64(IdxVectorSize));
4727
4728 // Compiler isn't smart enough, let's help it
4729 if (isa<Constant>(Idx))
4730 return;
4731
4732 auto *IdxShadow = getShadow(Idx);
4733 Value *Truncated = IRB.CreateTrunc(
4734 IdxShadow,
4735 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4736 IdxVectorSize));
4737 insertCheckShadow(Truncated, getOrigin(Idx), I);
4738 }
4739
4740 // Instrument AVX permutation intrinsic.
4741 // We apply the same permutation (argument index 1) to the shadow.
4742 void handleAVXVpermilvar(IntrinsicInst &I) {
4743 IRBuilder<> IRB(&I);
4744 Value *Shadow = getShadow(&I, 0);
4745 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4746
4747 // Shadows are integer-ish types but some intrinsics require a
4748 // different (e.g., floating-point) type.
4749 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4750 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4751 {Shadow, I.getArgOperand(1)});
4752
4753 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4754 setOriginForNaryOp(I);
4755 }
4756
4757 // Instrument AVX permutation intrinsic.
4758 // We apply the same permutation (argument index 1) to the shadows.
4759 void handleAVXVpermi2var(IntrinsicInst &I) {
4760 assert(I.arg_size() == 3);
4761 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4762 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4763 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4764 [[maybe_unused]] auto ArgVectorSize =
4765 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4766 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4767 ->getNumElements() == ArgVectorSize);
4768 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4769 ->getNumElements() == ArgVectorSize);
4770 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4771 assert(I.getType() == I.getArgOperand(0)->getType());
4772 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4773 IRBuilder<> IRB(&I);
4774 Value *AShadow = getShadow(&I, 0);
4775 Value *Idx = I.getArgOperand(1);
4776 Value *BShadow = getShadow(&I, 2);
4777
4778 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4779
4780 // Shadows are integer-ish types but some intrinsics require a
4781 // different (e.g., floating-point) type.
4782 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4783 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4784 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4785 {AShadow, Idx, BShadow});
4786 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4787 setOriginForNaryOp(I);
4788 }
4789
4790 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4791 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4792 }
4793
4794 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4795 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4796 }
4797
4798 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4799 return isFixedIntVectorTy(V->getType());
4800 }
4801
4802 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4803 return isFixedFPVectorTy(V->getType());
4804 }
4805
4806 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4807 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4808 // i32 rounding)
4809 //
4810 // Inconveniently, some similar intrinsics have a different operand order:
4811 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4812 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4813 // i16 mask)
4814 //
4815 // If the return type has more elements than A, the excess elements are
4816 // zeroed (and the corresponding shadow is initialized).
4817 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4818 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4819 // i8 mask)
4820 //
4821 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4822 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4823 // where all_or_nothing(x) is fully uninitialized if x has any
4824 // uninitialized bits
4825 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4826 IRBuilder<> IRB(&I);
4827
4828 assert(I.arg_size() == 4);
4829 Value *A = I.getOperand(0);
4830 Value *WriteThrough;
4831 Value *Mask;
4833 if (LastMask) {
4834 WriteThrough = I.getOperand(2);
4835 Mask = I.getOperand(3);
4836 RoundingMode = I.getOperand(1);
4837 } else {
4838 WriteThrough = I.getOperand(1);
4839 Mask = I.getOperand(2);
4840 RoundingMode = I.getOperand(3);
4841 }
4842
4843 assert(isFixedFPVector(A));
4844 assert(isFixedIntVector(WriteThrough));
4845
4846 unsigned ANumElements =
4847 cast<FixedVectorType>(A->getType())->getNumElements();
4848 [[maybe_unused]] unsigned WriteThruNumElements =
4849 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4850 assert(ANumElements == WriteThruNumElements ||
4851 ANumElements * 2 == WriteThruNumElements);
4852
4853 assert(Mask->getType()->isIntegerTy());
4854 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4855 assert(ANumElements == MaskNumElements ||
4856 ANumElements * 2 == MaskNumElements);
4857
4858 assert(WriteThruNumElements == MaskNumElements);
4859
4860 // Some bits of the mask may be unused, though it's unusual to have partly
4861 // uninitialized bits.
4862 insertCheckShadowOf(Mask, &I);
4863
4864 assert(RoundingMode->getType()->isIntegerTy());
4865 // Only some bits of the rounding mode are used, though it's very
4866 // unusual to have uninitialized bits there (more commonly, it's a
4867 // constant).
4868 insertCheckShadowOf(RoundingMode, &I);
4869
4870 assert(I.getType() == WriteThrough->getType());
4871
4872 Value *AShadow = getShadow(A);
4873 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4874
4875 if (ANumElements * 2 == MaskNumElements) {
4876 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4877 // from the zeroed shadow instead of the writethrough's shadow.
4878 Mask =
4879 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4880 Mask =
4881 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4882 }
4883
4884 // Convert i16 mask to <16 x i1>
4885 Mask = IRB.CreateBitCast(
4886 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4887 "_ms_mask_bitcast");
4888
4889 /// For floating-point to integer conversion, the output is:
4890 /// - fully uninitialized if *any* bit of the input is uninitialized
4891 /// - fully ininitialized if all bits of the input are ininitialized
4892 /// We apply the same principle on a per-element basis for vectors.
4893 ///
4894 /// We use the scalar width of the return type instead of A's.
4895 AShadow = IRB.CreateSExt(
4896 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4897 getShadowTy(&I), "_ms_a_shadow");
4898
4899 Value *WriteThroughShadow = getShadow(WriteThrough);
4900 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4901 "_ms_writethru_select");
4902
4903 setShadow(&I, Shadow);
4904 setOriginForNaryOp(I);
4905 }
4906
4907 // Instrument BMI / BMI2 intrinsics.
4908 // All of these intrinsics are Z = I(X, Y)
4909 // where the types of all operands and the result match, and are either i32 or
4910 // i64. The following instrumentation happens to work for all of them:
4911 // Sz = I(Sx, Y) | (sext (Sy != 0))
4912 void handleBmiIntrinsic(IntrinsicInst &I) {
4913 IRBuilder<> IRB(&I);
4914 Type *ShadowTy = getShadowTy(&I);
4915
4916 // If any bit of the mask operand is poisoned, then the whole thing is.
4917 Value *SMask = getShadow(&I, 1);
4918 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4919 ShadowTy);
4920 // Apply the same intrinsic to the shadow of the first operand.
4921 Value *S = IRB.CreateCall(I.getCalledFunction(),
4922 {getShadow(&I, 0), I.getOperand(1)});
4923 S = IRB.CreateOr(SMask, S);
4924 setShadow(&I, S);
4925 setOriginForNaryOp(I);
4926 }
4927
4928 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4929 SmallVector<int, 8> Mask;
4930 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4931 Mask.append(2, X);
4932 }
4933 return Mask;
4934 }
4935
4936 // Instrument pclmul intrinsics.
4937 // These intrinsics operate either on odd or on even elements of the input
4938 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4939 // Replace the unused elements with copies of the used ones, ex:
4940 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4941 // or
4942 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4943 // and then apply the usual shadow combining logic.
4944 void handlePclmulIntrinsic(IntrinsicInst &I) {
4945 IRBuilder<> IRB(&I);
4946 unsigned Width =
4947 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4948 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4949 "pclmul 3rd operand must be a constant");
4950 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4951 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4952 getPclmulMask(Width, Imm & 0x01));
4953 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4954 getPclmulMask(Width, Imm & 0x10));
4955 ShadowAndOriginCombiner SOC(this, IRB);
4956 SOC.Add(Shuf0, getOrigin(&I, 0));
4957 SOC.Add(Shuf1, getOrigin(&I, 1));
4958 SOC.Done(&I);
4959 }
4960
4961 // Instrument _mm_*_sd|ss intrinsics
4962 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4963 IRBuilder<> IRB(&I);
4964 unsigned Width =
4965 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4966 Value *First = getShadow(&I, 0);
4967 Value *Second = getShadow(&I, 1);
4968 // First element of second operand, remaining elements of first operand
4969 SmallVector<int, 16> Mask;
4970 Mask.push_back(Width);
4971 for (unsigned i = 1; i < Width; i++)
4972 Mask.push_back(i);
4973 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4974
4975 setShadow(&I, Shadow);
4976 setOriginForNaryOp(I);
4977 }
4978
4979 void handleVtestIntrinsic(IntrinsicInst &I) {
4980 IRBuilder<> IRB(&I);
4981 Value *Shadow0 = getShadow(&I, 0);
4982 Value *Shadow1 = getShadow(&I, 1);
4983 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4984 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4985 Value *Scalar = convertShadowToScalar(NZ, IRB);
4986 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4987
4988 setShadow(&I, Shadow);
4989 setOriginForNaryOp(I);
4990 }
4991
4992 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4993 IRBuilder<> IRB(&I);
4994 unsigned Width =
4995 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4996 Value *First = getShadow(&I, 0);
4997 Value *Second = getShadow(&I, 1);
4998 Value *OrShadow = IRB.CreateOr(First, Second);
4999 // First element of both OR'd together, remaining elements of first operand
5000 SmallVector<int, 16> Mask;
5001 Mask.push_back(Width);
5002 for (unsigned i = 1; i < Width; i++)
5003 Mask.push_back(i);
5004 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
5005
5006 setShadow(&I, Shadow);
5007 setOriginForNaryOp(I);
5008 }
5009
5010 // _mm_round_ps / _mm_round_ps.
5011 // Similar to maybeHandleSimpleNomemIntrinsic except
5012 // the second argument is guaranteed to be a constant integer.
5013 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
5014 assert(I.getArgOperand(0)->getType() == I.getType());
5015 assert(I.arg_size() == 2);
5016 assert(isa<ConstantInt>(I.getArgOperand(1)));
5017
5018 IRBuilder<> IRB(&I);
5019 ShadowAndOriginCombiner SC(this, IRB);
5020 SC.Add(I.getArgOperand(0));
5021 SC.Done(&I);
5022 }
5023
5024 // Instrument @llvm.abs intrinsic.
5025 //
5026 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
5027 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
5028 void handleAbsIntrinsic(IntrinsicInst &I) {
5029 assert(I.arg_size() == 2);
5030 Value *Src = I.getArgOperand(0);
5031 Value *IsIntMinPoison = I.getArgOperand(1);
5032
5033 assert(I.getType()->isIntOrIntVectorTy());
5034
5035 assert(Src->getType() == I.getType());
5036
5037 assert(IsIntMinPoison->getType()->isIntegerTy());
5038 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
5039
5040 IRBuilder<> IRB(&I);
5041 Value *SrcShadow = getShadow(Src);
5042
5043 APInt MinVal =
5044 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
5045 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
5046 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
5047
5048 Value *PoisonedShadow = getPoisonedShadow(Src);
5049 Value *PoisonedIfIntMinShadow =
5050 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
5051 Value *Shadow =
5052 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
5053
5054 setShadow(&I, Shadow);
5055 setOrigin(&I, getOrigin(&I, 0));
5056 }
5057
5058 void handleIsFpClass(IntrinsicInst &I) {
5059 IRBuilder<> IRB(&I);
5060 Value *Shadow = getShadow(&I, 0);
5061 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
5062 setOrigin(&I, getOrigin(&I, 0));
5063 }
5064
5065 void handleArithmeticWithOverflow(IntrinsicInst &I) {
5066 IRBuilder<> IRB(&I);
5067 Value *Shadow0 = getShadow(&I, 0);
5068 Value *Shadow1 = getShadow(&I, 1);
5069 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
5070 Value *ShadowElt1 =
5071 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
5072
5073 Value *Shadow = PoisonValue::get(getShadowTy(&I));
5074 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
5075 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
5076
5077 setShadow(&I, Shadow);
5078 setOriginForNaryOp(I);
5079 }
5080
5081 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
5082 assert(isa<FixedVectorType>(V->getType()));
5083 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
5084 Value *Shadow = getShadow(V);
5085 return IRB.CreateExtractElement(Shadow,
5086 ConstantInt::get(IRB.getInt32Ty(), 0));
5087 }
5088
5089 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.{128,256,512}
5090 //
5091 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
5092 // (<8 x i64>, <16 x i8>, i8)
5093 // A WriteThru Mask
5094 //
5095 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
5096 // (<16 x i32>, <16 x i8>, i16)
5097 //
5098 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
5099 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
5100 //
5101 // If Dst has more elements than A, the excess elements are zeroed (and the
5102 // corresponding shadow is initialized).
5103 //
5104 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
5105 // and is much faster than this handler.
5106 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
5107 IRBuilder<> IRB(&I);
5108
5109 assert(I.arg_size() == 3);
5110 Value *A = I.getOperand(0);
5111 Value *WriteThrough = I.getOperand(1);
5112 Value *Mask = I.getOperand(2);
5113
5114 assert(isFixedIntVector(A));
5115 assert(isFixedIntVector(WriteThrough));
5116
5117 unsigned ANumElements =
5118 cast<FixedVectorType>(A->getType())->getNumElements();
5119 unsigned OutputNumElements =
5120 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
5121 assert(ANumElements == OutputNumElements ||
5122 ANumElements * 2 == OutputNumElements);
5123 // N.B. some PMOV{,S,US} instructions have a 4x or even 8x ratio in the
5124 // number of elements e.g.,
5125 // <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.256
5126 // (<4 x i64>, <16 x i8>, i8)
5127 // <16 x i8> @llvm.x86.avx512.mask.pmovs.qb.128
5128 // (<2 x i64>, <16 x i8>, i8)
5129 // However, we currently handle those elsewhere.
5130
5131 assert(Mask->getType()->isIntegerTy());
5132 insertCheckShadowOf(Mask, &I);
5133
5134 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5135 if (Mask->getType()->getScalarSizeInBits() == 8 && OutputNumElements < 8)
5136 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, OutputNumElements));
5137 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5138
5139 assert(I.getType() == WriteThrough->getType());
5140
5141 // Widen the mask, if necessary, to have one bit per element of the output
5142 // vector.
5143 // We want the extra bits to have '1's, so that the CreateSelect will
5144 // select the values from AShadow instead of WriteThroughShadow ("maskless"
5145 // versions of the intrinsics are sometimes implemented using an all-1's
5146 // mask and an undefined value for WriteThroughShadow). We accomplish this
5147 // by using bitwise NOT before and after the ZExt.
5148 if (ANumElements != OutputNumElements) {
5149 Mask = IRB.CreateNot(Mask);
5150 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
5151 "_ms_widen_mask");
5152 Mask = IRB.CreateNot(Mask);
5153 }
5154 Mask = IRB.CreateBitCast(
5155 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5156
5157 Value *AShadow = getShadow(A);
5158
5159 // The return type might have more elements than the input.
5160 // Temporarily shrink the return type's number of elements.
5161 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
5162
5163 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
5164 // This handler treats them all as truncation, which leads to some rare
5165 // false positives in the cases where the truncated bytes could
5166 // unambiguously saturate the value e.g., if A = ??????10 ????????
5167 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
5168 // fully defined, but the truncated byte is ????????.
5169 //
5170 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
5171 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
5172 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
5173
5174 Value *WriteThroughShadow = getShadow(WriteThrough);
5175
5176 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
5177 setShadow(&I, Shadow);
5178 setOriginForNaryOp(I);
5179 }
5180
5181 // Handle llvm.x86.avx512.* instructions that take vector(s) of floating-point
5182 // values and perform an operation whose shadow propagation should be handled
5183 // as all-or-nothing [*], with masking provided by a vector and a mask
5184 // supplied as an integer.
5185 //
5186 // [*] if all bits of a vector element are initialized, the output is fully
5187 // initialized; otherwise, the output is fully uninitialized
5188 //
5189 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5190 // (<16 x float>, <16 x float>, i16)
5191 // A WriteThru Mask
5192 //
5193 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5194 // (<2 x double>, <2 x double>, i8)
5195 // A WriteThru Mask
5196 //
5197 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5198 // (<8 x double>, i32, <8 x double>, i8, i32)
5199 // A Imm WriteThru Mask Rounding
5200 //
5201 // <16 x float> @llvm.x86.avx512.mask.scalef.ps.512
5202 // (<16 x float>, <16 x float>, <16 x float>, i16, i32)
5203 // WriteThru A B Mask Rnd
5204 //
5205 // All operands other than A, B, ..., and WriteThru (e.g., Mask, Imm,
5206 // Rounding) must be fully initialized.
5207 //
5208 // Dst[i] = Mask[i] ? some_op(A[i], B[i], ...)
5209 // : WriteThru[i]
5210 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i] | B_shadow[i] | ...)
5211 // : WriteThru_shadow[i]
5212 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I,
5213 SmallVector<unsigned, 4> DataIndices,
5214 unsigned WriteThruIndex,
5215 unsigned MaskIndex) {
5216 IRBuilder<> IRB(&I);
5217
5218 unsigned NumArgs = I.arg_size();
5219
5220 assert(WriteThruIndex < NumArgs);
5221 assert(MaskIndex < NumArgs);
5222 assert(WriteThruIndex != MaskIndex);
5223 Value *WriteThru = I.getOperand(WriteThruIndex);
5224
5225 unsigned OutputNumElements =
5226 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5227
5228 assert(DataIndices.size() > 0);
5229
5230 bool isData[16] = {false};
5231 assert(NumArgs <= 16);
5232 for (unsigned i : DataIndices) {
5233 assert(i < NumArgs);
5234 assert(i != WriteThruIndex);
5235 assert(i != MaskIndex);
5236
5237 isData[i] = true;
5238
5239 Value *A = I.getOperand(i);
5240 assert(isFixedFPVector(A));
5241 [[maybe_unused]] unsigned ANumElements =
5242 cast<FixedVectorType>(A->getType())->getNumElements();
5243 assert(ANumElements == OutputNumElements);
5244 }
5245
5246 Value *Mask = I.getOperand(MaskIndex);
5247
5248 assert(isFixedFPVector(WriteThru));
5249
5250 for (unsigned i = 0; i < NumArgs; ++i) {
5251 if (!isData[i] && i != WriteThruIndex) {
5252 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5253 // they be fully initialized.
5254 assert(I.getOperand(i)->getType()->isIntegerTy());
5255 insertCheckShadowOf(I.getOperand(i), &I);
5256 }
5257 }
5258
5259 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5260 if (Mask->getType()->getScalarSizeInBits() == 8 && OutputNumElements < 8)
5261 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, OutputNumElements));
5262 assert(Mask->getType()->getScalarSizeInBits() == OutputNumElements);
5263
5264 assert(I.getType() == WriteThru->getType());
5265
5266 Mask = IRB.CreateBitCast(
5267 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5268
5269 Value *DataShadow = nullptr;
5270 for (unsigned i : DataIndices) {
5271 Value *A = I.getOperand(i);
5272 if (DataShadow)
5273 DataShadow = IRB.CreateOr(DataShadow, getShadow(A));
5274 else
5275 DataShadow = getShadow(A);
5276 }
5277
5278 // All-or-nothing shadow
5279 DataShadow =
5280 IRB.CreateSExt(IRB.CreateICmpNE(DataShadow, getCleanShadow(DataShadow)),
5281 DataShadow->getType());
5282
5283 Value *WriteThruShadow = getShadow(WriteThru);
5284
5285 Value *Shadow = IRB.CreateSelect(Mask, DataShadow, WriteThruShadow);
5286 setShadow(&I, Shadow);
5287
5288 setOriginForNaryOp(I);
5289 }
5290
5291 // For sh.* compiler intrinsics:
5292 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5293 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5294 // A B WriteThru Mask RoundingMode
5295 //
5296 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5297 // DstShadow[1..7] = AShadow[1..7]
5298 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5299 IRBuilder<> IRB(&I);
5300
5301 assert(I.arg_size() == 5);
5302 Value *A = I.getOperand(0);
5303 Value *B = I.getOperand(1);
5304 Value *WriteThrough = I.getOperand(2);
5305 Value *Mask = I.getOperand(3);
5306 Value *RoundingMode = I.getOperand(4);
5307
5308 // Technically, we could probably just check whether the LSB is
5309 // initialized, but intuitively it feels like a partly uninitialized mask
5310 // is unintended, and we should warn the user immediately.
5311 insertCheckShadowOf(Mask, &I);
5312 insertCheckShadowOf(RoundingMode, &I);
5313
5314 assert(isa<FixedVectorType>(A->getType()));
5315 unsigned NumElements =
5316 cast<FixedVectorType>(A->getType())->getNumElements();
5317 assert(NumElements == 8);
5318 assert(A->getType() == B->getType());
5319 assert(B->getType() == WriteThrough->getType());
5320 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5321 assert(RoundingMode->getType()->isIntegerTy());
5322
5323 Value *ALowerShadow = extractLowerShadow(IRB, A);
5324 Value *BLowerShadow = extractLowerShadow(IRB, B);
5325
5326 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5327
5328 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5329
5330 Mask = IRB.CreateBitCast(
5331 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5332 Value *MaskLower =
5333 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5334
5335 Value *AShadow = getShadow(A);
5336 Value *DstLowerShadow =
5337 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5338 Value *DstShadow = IRB.CreateInsertElement(
5339 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5340 "_msprop");
5341
5342 setShadow(&I, DstShadow);
5343 setOriginForNaryOp(I);
5344 }
5345
5346 // Approximately handle AVX Galois Field Affine Transformation
5347 //
5348 // e.g.,
5349 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5350 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5351 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5352 // Out A x b
5353 // where A and x are packed matrices, b is a vector,
5354 // Out = A * x + b in GF(2)
5355 //
5356 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5357 // computation also includes a parity calculation.
5358 //
5359 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5360 // Out_Shadow = (V1_Shadow & V2_Shadow)
5361 // | (V1 & V2_Shadow)
5362 // | (V1_Shadow & V2 )
5363 //
5364 // We approximate the shadow of gf2p8affineqb using:
5365 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5366 // | gf2p8affineqb(x, A_shadow, 0)
5367 // | gf2p8affineqb(x_Shadow, A, 0)
5368 // | set1_epi8(b_Shadow)
5369 //
5370 // This approximation has false negatives: if an intermediate dot-product
5371 // contains an even number of 1's, the parity is 0.
5372 // It has no false positives.
5373 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5374 IRBuilder<> IRB(&I);
5375
5376 assert(I.arg_size() == 3);
5377 Value *A = I.getOperand(0);
5378 Value *X = I.getOperand(1);
5379 Value *B = I.getOperand(2);
5380
5381 assert(isFixedIntVector(A));
5382 assert(cast<VectorType>(A->getType())
5383 ->getElementType()
5384 ->getScalarSizeInBits() == 8);
5385
5386 assert(A->getType() == X->getType());
5387
5388 assert(B->getType()->isIntegerTy());
5389 assert(B->getType()->getScalarSizeInBits() == 8);
5390
5391 assert(I.getType() == A->getType());
5392
5393 Value *AShadow = getShadow(A);
5394 Value *XShadow = getShadow(X);
5395 Value *BZeroShadow = getCleanShadow(B);
5396
5397 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5398 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5399 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5400 {X, AShadow, BZeroShadow});
5401 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5402 {XShadow, A, BZeroShadow});
5403
5404 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5405 Value *BShadow = getShadow(B);
5406 Value *BBroadcastShadow = getCleanShadow(AShadow);
5407 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5408 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5409 // lower appropriately (e.g., VPBROADCASTB).
5410 // Besides, b is often a constant, in which case it is fully initialized.
5411 for (unsigned i = 0; i < NumElements; i++)
5412 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5413
5414 setShadow(&I, IRB.CreateOr(
5415 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5416 setOriginForNaryOp(I);
5417 }
5418
5419 // Handle Arm NEON vector load intrinsics (vld*).
5420 //
5421 // The WithLane instructions (ld[234]lane) are similar to:
5422 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5423 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5424 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5425 // %A)
5426 //
5427 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5428 // to:
5429 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5430 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5431 unsigned int numArgs = I.arg_size();
5432
5433 // Return type is a struct of vectors of integers or floating-point
5434 assert(I.getType()->isStructTy());
5435 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5436 assert(RetTy->getNumElements() > 0);
5438 RetTy->getElementType(0)->isFPOrFPVectorTy());
5439 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5440 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5441
5442 if (WithLane) {
5443 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5444 assert(4 <= numArgs && numArgs <= 6);
5445
5446 // Return type is a struct of the input vectors
5447 assert(RetTy->getNumElements() + 2 == numArgs);
5448 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5449 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5450 } else {
5451 assert(numArgs == 1);
5452 }
5453
5454 IRBuilder<> IRB(&I);
5455
5456 SmallVector<Value *, 6> ShadowArgs;
5457 if (WithLane) {
5458 for (unsigned int i = 0; i < numArgs - 2; i++)
5459 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5460
5461 // Lane number, passed verbatim
5462 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5463 ShadowArgs.push_back(LaneNumber);
5464
5465 // TODO: blend shadow of lane number into output shadow?
5466 insertCheckShadowOf(LaneNumber, &I);
5467 }
5468
5469 Value *Src = I.getArgOperand(numArgs - 1);
5470 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5471
5472 Type *SrcShadowTy = getShadowTy(Src);
5473 auto [SrcShadowPtr, SrcOriginPtr] =
5474 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5475 ShadowArgs.push_back(SrcShadowPtr);
5476
5477 // The NEON vector load instructions handled by this function all have
5478 // integer variants. It is easier to use those rather than trying to cast
5479 // a struct of vectors of floats into a struct of vectors of integers.
5480 CallInst *CI =
5481 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5482 setShadow(&I, CI);
5483
5484 if (!MS.TrackOrigins)
5485 return;
5486
5487 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5488 setOrigin(&I, PtrSrcOrigin);
5489 }
5490
5491 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5492 /// and vst{2,3,4}lane).
5493 ///
5494 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5495 /// last argument, with the initial arguments being the inputs (and lane
5496 /// number for vst{2,3,4}lane). They return void.
5497 ///
5498 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5499 /// abcdabcdabcdabcd... into *outP
5500 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5501 /// writes aaaa...bbbb...cccc...dddd... into *outP
5502 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5503 /// These instructions can all be instrumented with essentially the same
5504 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5505 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5506 IRBuilder<> IRB(&I);
5507
5508 // Don't use getNumOperands() because it includes the callee
5509 int numArgOperands = I.arg_size();
5510
5511 // The last arg operand is the output (pointer)
5512 assert(numArgOperands >= 1);
5513 Value *Addr = I.getArgOperand(numArgOperands - 1);
5514 assert(Addr->getType()->isPointerTy());
5515 int skipTrailingOperands = 1;
5516
5518 insertCheckShadowOf(Addr, &I);
5519
5520 // Second-last operand is the lane number (for vst{2,3,4}lane)
5521 if (useLane) {
5522 skipTrailingOperands++;
5523 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5525 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5526 }
5527
5528 SmallVector<Value *, 8> ShadowArgs;
5529 // All the initial operands are the inputs
5530 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5531 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5532 Value *Shadow = getShadow(&I, i);
5533 ShadowArgs.append(1, Shadow);
5534 }
5535
5536 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5537 // e.g., for:
5538 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5539 // we know the type of the output (and its shadow) is <16 x i8>.
5540 //
5541 // Arm NEON VST is unusual because the last argument is the output address:
5542 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5543 // call void @llvm.aarch64.neon.st2.v16i8.p0
5544 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5545 // and we have no type information about P's operand. We must manually
5546 // compute the type (<16 x i8> x 2).
5547 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5548 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5549 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5550 (numArgOperands - skipTrailingOperands));
5551 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5552
5553 if (useLane)
5554 ShadowArgs.append(1,
5555 I.getArgOperand(numArgOperands - skipTrailingOperands));
5556
5557 Value *OutputShadowPtr, *OutputOriginPtr;
5558 // AArch64 NEON does not need alignment (unless OS requires it)
5559 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5560 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5561 ShadowArgs.append(1, OutputShadowPtr);
5562
5563 CallInst *CI =
5564 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5565 setShadow(&I, CI);
5566
5567 if (MS.TrackOrigins) {
5568 // TODO: if we modelled the vst* instruction more precisely, we could
5569 // more accurately track the origins (e.g., if both inputs are
5570 // uninitialized for vst2, we currently blame the second input, even
5571 // though part of the output depends only on the first input).
5572 //
5573 // This is particularly imprecise for vst{2,3,4}lane, since only one
5574 // lane of each input is actually copied to the output.
5575 OriginCombiner OC(this, IRB);
5576 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5577 OC.Add(I.getArgOperand(i));
5578
5579 const DataLayout &DL = F.getDataLayout();
5580 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5581 OutputOriginPtr);
5582 }
5583 }
5584
5585 // Integer matrix multiplication:
5586 // - <4 x i32> @llvm.aarch64.neon.{s,u,us}mmla.v4i32.v16i8
5587 // (<4 x i32> %R, <16 x i8> %A, <16 x i8> %B)
5588 // - <4 x i32> is a 2x2 matrix
5589 // - <16 x i8> %A and %B are 2x8 and 8x2 matrices respectively
5590 //
5591 // Floating-point matrix multiplication:
5592 // - <4 x float> @llvm.aarch64.neon.bfmmla
5593 // (<4 x float> %R, <8 x bfloat> %A, <8 x bfloat> %B)
5594 // - <4 x float> is a 2x2 matrix
5595 // - <8 x bfloat> %A and %B are 2x4 and 4x2 matrices respectively
5596 //
5597 // The general shadow propagation approach is:
5598 // 1) get the shadows of the input matrices %A and %B
5599 // 2) map each shadow value to 0x1 if the corresponding value is fully
5600 // initialized, and 0x0 otherwise
5601 // 3) perform a matrix multiplication on the shadows of %A and %B [*].
5602 // The output will be a 2x2 matrix. For each element, a value of 0x8
5603 // (for {s,u,us}mmla) or 0x4 (for bfmmla) means all the corresponding
5604 // inputs were clean; if so, set the shadow to zero, otherwise set to -1.
5605 // 4) blend in the shadow of %R
5606 //
5607 // [*] Since shadows are integral, the obvious approach is to always apply
5608 // ummla to the shadows. Unfortunately, Armv8.2+bf16 supports bfmmla,
5609 // but not ummla. Thus, for bfmmla, our instrumentation reuses bfmmla.
5610 //
5611 // TODO: consider allowing multiplication of zero with an uninitialized value
5612 // to result in an initialized value.
5613 void handleNEONMatrixMultiply(IntrinsicInst &I) {
5614 IRBuilder<> IRB(&I);
5615
5616 assert(I.arg_size() == 3);
5617 Value *R = I.getArgOperand(0);
5618 Value *A = I.getArgOperand(1);
5619 Value *B = I.getArgOperand(2);
5620
5621 assert(I.getType() == R->getType());
5622
5623 assert(isa<FixedVectorType>(R->getType()));
5624 assert(isa<FixedVectorType>(A->getType()));
5625 assert(isa<FixedVectorType>(B->getType()));
5626
5627 FixedVectorType *RTy = cast<FixedVectorType>(R->getType());
5628 FixedVectorType *ATy = cast<FixedVectorType>(A->getType());
5629 FixedVectorType *BTy = cast<FixedVectorType>(B->getType());
5630 assert(ATy->getElementType() == BTy->getElementType());
5631
5632 if (RTy->getElementType()->isIntegerTy()) {
5633 // <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5634 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5635 assert(RTy == FixedVectorType::get(IntegerType::get(*MS.C, 32), 4));
5636 assert(ATy == FixedVectorType::get(IntegerType::get(*MS.C, 8), 16));
5637 assert(BTy == FixedVectorType::get(IntegerType::get(*MS.C, 8), 16));
5638 } else {
5639 // <4 x float> @llvm.aarch64.neon.bfmmla
5640 // (<4 x float> %R, <8 x bfloat> %X, <8 x bfloat> %Y)
5641 assert(RTy == FixedVectorType::get(Type::getFloatTy(*MS.C), 4));
5642 assert(ATy == FixedVectorType::get(Type::getBFloatTy(*MS.C), 8));
5643 assert(BTy == FixedVectorType::get(Type::getBFloatTy(*MS.C), 8));
5644 }
5645
5646 Value *ShadowR = getShadow(&I, 0);
5647 Value *ShadowA = getShadow(&I, 1);
5648 Value *ShadowB = getShadow(&I, 2);
5649
5650 Value *ShadowAB;
5651 Value *FullyInit;
5652
5653 if (RTy->getElementType()->isIntegerTy()) {
5654 // If the value is fully initialized, the shadow will be 000...001.
5655 // Otherwise, the shadow will be all zero.
5656 // (This is the opposite of how we typically handle shadows.)
5657 ShadowA = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowA, getCleanShadow(ATy)),
5658 getShadowTy(ATy));
5659 ShadowB = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowB, getCleanShadow(BTy)),
5660 getShadowTy(BTy));
5661 // TODO: the CreateSelect approach used below for floating-point is more
5662 // generic than CreateZExt. Investigate whether it is worthwhile
5663 // unifying the two approaches.
5664
5665 ShadowAB = IRB.CreateIntrinsic(RTy, Intrinsic::aarch64_neon_ummla,
5666 {getCleanShadow(RTy), ShadowA, ShadowB});
5667
5668 // ummla multiplies a 2x8 matrix with an 8x2 matrix. If all entries of the
5669 // input matrices are equal to 0x1, all entries of the output matrix will
5670 // be 0x8.
5671 FullyInit = ConstantVector::getSplat(
5672 RTy->getElementCount(), ConstantInt::get(RTy->getElementType(), 0x8));
5673
5674 ShadowAB = IRB.CreateICmpNE(ShadowAB, FullyInit);
5675 } else {
5677 ATy->getElementCount(), ConstantFP::get(ATy->getElementType(), 0));
5679 ATy->getElementCount(), ConstantFP::get(ATy->getElementType(), 1));
5680
5681 // As per the integer case, if the shadow is clean, we store 0x1,
5682 // otherwise we store 0x0 (the opposite of usual shadow arithmetic).
5683 ShadowA = IRB.CreateSelect(IRB.CreateICmpEQ(ShadowA, getCleanShadow(ATy)),
5684 ABOnes, ABZeros);
5685 ShadowB = IRB.CreateSelect(IRB.CreateICmpEQ(ShadowB, getCleanShadow(BTy)),
5686 ABOnes, ABZeros);
5687
5689 RTy->getElementCount(), ConstantFP::get(RTy->getElementType(), 0));
5690
5691 ShadowAB = IRB.CreateIntrinsic(RTy, Intrinsic::aarch64_neon_bfmmla,
5692 {RZeros, ShadowA, ShadowB});
5693
5694 // bfmmla multiplies a 2x4 matrix with an 4x2 matrix. If all entries of
5695 // the input matrices are equal to 0x1, all entries of the output matrix
5696 // will be 4.0. (To avoid floating-point error, we check if each entry
5697 // < 3.5.)
5698 FullyInit = ConstantVector::getSplat(
5699 RTy->getElementCount(), ConstantFP::get(RTy->getElementType(), 3.5));
5700
5701 // FCmpULT: "yields true if either operand is a QNAN or op1 is less than"
5702 // op2"
5703 ShadowAB = IRB.CreateFCmpULT(ShadowAB, FullyInit);
5704 }
5705
5706 ShadowR = IRB.CreateICmpNE(ShadowR, getCleanShadow(RTy));
5707 ShadowR = IRB.CreateOr(ShadowAB, ShadowR);
5708
5709 setShadow(&I, IRB.CreateSExt(ShadowR, getShadowTy(RTy)));
5710
5711 setOriginForNaryOp(I);
5712 }
5713
5714 /// Handle intrinsics by applying the intrinsic to the shadows.
5715 ///
5716 /// The trailing arguments are passed verbatim to the intrinsic, though any
5717 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5718 /// intrinsic with one trailing verbatim argument:
5719 /// out = intrinsic(var1, var2, opType)
5720 /// we compute:
5721 /// shadow[out] =
5722 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5723 ///
5724 /// Typically, shadowIntrinsicID will be specified by the caller to be
5725 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5726 /// intrinsic of the same type.
5727 ///
5728 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5729 /// bit-patterns (for example, if the intrinsic accepts floats for
5730 /// var1, we require that it doesn't care if inputs are NaNs).
5731 ///
5732 /// For example, this can be applied to the Arm NEON vector table intrinsics
5733 /// (tbl{1,2,3,4}).
5734 ///
5735 /// The origin is approximated using setOriginForNaryOp.
5736 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5737 Intrinsic::ID shadowIntrinsicID,
5738 unsigned int trailingVerbatimArgs) {
5739 IRBuilder<> IRB(&I);
5740
5741 assert(trailingVerbatimArgs < I.arg_size());
5742
5743 SmallVector<Value *, 8> ShadowArgs;
5744 // Don't use getNumOperands() because it includes the callee
5745 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5746 Value *Shadow = getShadow(&I, i);
5747
5748 // Shadows are integer-ish types but some intrinsics require a
5749 // different (e.g., floating-point) type.
5750 ShadowArgs.push_back(
5751 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5752 }
5753
5754 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5755 i++) {
5756 Value *Arg = I.getArgOperand(i);
5757 ShadowArgs.push_back(Arg);
5758 }
5759
5760 CallInst *CI =
5761 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5762 Value *CombinedShadow = CI;
5763
5764 // Combine the computed shadow with the shadow of trailing args
5765 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5766 i++) {
5767 Value *Shadow =
5768 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5769 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5770 }
5771
5772 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5773
5774 setOriginForNaryOp(I);
5775 }
5776
5777 // Approximation only
5778 //
5779 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5780 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5781 assert(I.arg_size() == 2);
5782
5783 handleShadowOr(I);
5784 }
5785
5786 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5787 switch (I.getIntrinsicID()) {
5788 case Intrinsic::uadd_with_overflow:
5789 case Intrinsic::sadd_with_overflow:
5790 case Intrinsic::usub_with_overflow:
5791 case Intrinsic::ssub_with_overflow:
5792 case Intrinsic::umul_with_overflow:
5793 case Intrinsic::smul_with_overflow:
5794 handleArithmeticWithOverflow(I);
5795 break;
5796 case Intrinsic::abs:
5797 handleAbsIntrinsic(I);
5798 break;
5799 case Intrinsic::bitreverse:
5800 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5801 /*trailingVerbatimArgs*/ 0);
5802 break;
5803 case Intrinsic::is_fpclass:
5804 handleIsFpClass(I);
5805 break;
5806 case Intrinsic::lifetime_start:
5807 handleLifetimeStart(I);
5808 break;
5809 case Intrinsic::launder_invariant_group:
5810 case Intrinsic::strip_invariant_group:
5811 handleInvariantGroup(I);
5812 break;
5813 case Intrinsic::bswap:
5814 handleBswap(I);
5815 break;
5816 case Intrinsic::ctlz:
5817 case Intrinsic::cttz:
5818 handleCountLeadingTrailingZeros(I);
5819 break;
5820 case Intrinsic::masked_compressstore:
5821 handleMaskedCompressStore(I);
5822 break;
5823 case Intrinsic::masked_expandload:
5824 handleMaskedExpandLoad(I);
5825 break;
5826 case Intrinsic::masked_gather:
5827 handleMaskedGather(I);
5828 break;
5829 case Intrinsic::masked_scatter:
5830 handleMaskedScatter(I);
5831 break;
5832 case Intrinsic::masked_store:
5833 handleMaskedStore(I);
5834 break;
5835 case Intrinsic::masked_load:
5836 handleMaskedLoad(I);
5837 break;
5838 case Intrinsic::vector_reduce_and:
5839 handleVectorReduceAndIntrinsic(I);
5840 break;
5841 case Intrinsic::vector_reduce_or:
5842 handleVectorReduceOrIntrinsic(I);
5843 break;
5844
5845 case Intrinsic::vector_reduce_add:
5846 case Intrinsic::vector_reduce_xor:
5847 case Intrinsic::vector_reduce_mul:
5848 // Signed/Unsigned Min/Max
5849 // TODO: handling similarly to AND/OR may be more precise.
5850 case Intrinsic::vector_reduce_smax:
5851 case Intrinsic::vector_reduce_smin:
5852 case Intrinsic::vector_reduce_umax:
5853 case Intrinsic::vector_reduce_umin:
5854 // TODO: this has no false positives, but arguably we should check that all
5855 // the bits are initialized.
5856 case Intrinsic::vector_reduce_fmax:
5857 case Intrinsic::vector_reduce_fmin:
5858 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5859 break;
5860
5861 case Intrinsic::vector_reduce_fadd:
5862 case Intrinsic::vector_reduce_fmul:
5863 handleVectorReduceWithStarterIntrinsic(I);
5864 break;
5865
5866 case Intrinsic::scmp:
5867 case Intrinsic::ucmp: {
5868 handleShadowOr(I);
5869 break;
5870 }
5871
5872 case Intrinsic::fshl:
5873 case Intrinsic::fshr:
5874 handleFunnelShift(I);
5875 break;
5876
5877 case Intrinsic::is_constant:
5878 // The result of llvm.is.constant() is always defined.
5879 setShadow(&I, getCleanShadow(&I));
5880 setOrigin(&I, getCleanOrigin());
5881 break;
5882
5883 // The non-saturating versions are handled by visitFPTo[US]IInst().
5884 //
5885 // N.B. some platform-specific intrinsics, such as AArch64 fcvtz[us], are
5886 // lowered to these cross-platform intrinsics.
5887 case Intrinsic::fptosi_sat:
5888 case Intrinsic::fptoui_sat:
5889 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/false);
5890 break;
5891
5892 default:
5893 return false;
5894 }
5895
5896 return true;
5897 }
5898
5899 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5900 switch (I.getIntrinsicID()) {
5901 case Intrinsic::x86_sse_stmxcsr:
5902 handleStmxcsr(I);
5903 break;
5904 case Intrinsic::x86_sse_ldmxcsr:
5905 handleLdmxcsr(I);
5906 break;
5907
5908 // Convert Scalar Double Precision Floating-Point Value
5909 // to Unsigned Doubleword Integer
5910 // etc.
5911 case Intrinsic::x86_avx512_vcvtsd2usi64:
5912 case Intrinsic::x86_avx512_vcvtsd2usi32:
5913 case Intrinsic::x86_avx512_vcvtss2usi64:
5914 case Intrinsic::x86_avx512_vcvtss2usi32:
5915 case Intrinsic::x86_avx512_cvttss2usi64:
5916 case Intrinsic::x86_avx512_cvttss2usi:
5917 case Intrinsic::x86_avx512_cvttsd2usi64:
5918 case Intrinsic::x86_avx512_cvttsd2usi:
5919 case Intrinsic::x86_avx512_cvtusi2ss:
5920 case Intrinsic::x86_avx512_cvtusi642sd:
5921 case Intrinsic::x86_avx512_cvtusi642ss:
5922 handleSSEVectorConvertIntrinsic(I, 1, true);
5923 break;
5924 case Intrinsic::x86_sse2_cvtsd2si64:
5925 case Intrinsic::x86_sse2_cvtsd2si:
5926 case Intrinsic::x86_sse2_cvtsd2ss:
5927 case Intrinsic::x86_sse2_cvttsd2si64:
5928 case Intrinsic::x86_sse2_cvttsd2si:
5929 case Intrinsic::x86_sse_cvtss2si64:
5930 case Intrinsic::x86_sse_cvtss2si:
5931 case Intrinsic::x86_sse_cvttss2si64:
5932 case Intrinsic::x86_sse_cvttss2si:
5933 handleSSEVectorConvertIntrinsic(I, 1);
5934 break;
5935 case Intrinsic::x86_sse_cvtps2pi:
5936 case Intrinsic::x86_sse_cvttps2pi:
5937 handleSSEVectorConvertIntrinsic(I, 2);
5938 break;
5939
5940 // TODO:
5941 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5942 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5943 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5944
5945 case Intrinsic::x86_vcvtps2ph_128:
5946 case Intrinsic::x86_vcvtps2ph_256: {
5947 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5948 break;
5949 }
5950
5951 // Convert Packed Single Precision Floating-Point Values
5952 // to Packed Signed Doubleword Integer Values
5953 //
5954 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5955 // (<16 x float>, <16 x i32>, i16, i32)
5956 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5957 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5958 break;
5959
5960 // Convert Packed Double Precision Floating-Point Values
5961 // to Packed Single Precision Floating-Point Values
5962 case Intrinsic::x86_sse2_cvtpd2ps:
5963 case Intrinsic::x86_sse2_cvtps2dq:
5964 case Intrinsic::x86_sse2_cvtpd2dq:
5965 case Intrinsic::x86_sse2_cvttps2dq:
5966 case Intrinsic::x86_sse2_cvttpd2dq:
5967 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5968 case Intrinsic::x86_avx_cvt_ps2dq_256:
5969 case Intrinsic::x86_avx_cvt_pd2dq_256:
5970 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5971 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5972 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5973 break;
5974 }
5975
5976 // Convert Single-Precision FP Value to 16-bit FP Value
5977 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5978 // (<16 x float>, i32, <16 x i16>, i16)
5979 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5980 // (<4 x float>, i32, <8 x i16>, i8)
5981 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5982 // (<8 x float>, i32, <8 x i16>, i8)
5983 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5984 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5985 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5986 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5987 break;
5988
5989 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5990 case Intrinsic::x86_avx512_psll_w_512:
5991 case Intrinsic::x86_avx512_psll_d_512:
5992 case Intrinsic::x86_avx512_psll_q_512:
5993 case Intrinsic::x86_avx512_pslli_w_512:
5994 case Intrinsic::x86_avx512_pslli_d_512:
5995 case Intrinsic::x86_avx512_pslli_q_512:
5996 case Intrinsic::x86_avx512_psrl_w_512:
5997 case Intrinsic::x86_avx512_psrl_d_512:
5998 case Intrinsic::x86_avx512_psrl_q_512:
5999 case Intrinsic::x86_avx512_psra_w_512:
6000 case Intrinsic::x86_avx512_psra_d_512:
6001 case Intrinsic::x86_avx512_psra_q_512:
6002 case Intrinsic::x86_avx512_psrli_w_512:
6003 case Intrinsic::x86_avx512_psrli_d_512:
6004 case Intrinsic::x86_avx512_psrli_q_512:
6005 case Intrinsic::x86_avx512_psrai_w_512:
6006 case Intrinsic::x86_avx512_psrai_d_512:
6007 case Intrinsic::x86_avx512_psrai_q_512:
6008 case Intrinsic::x86_avx512_psra_q_256:
6009 case Intrinsic::x86_avx512_psra_q_128:
6010 case Intrinsic::x86_avx512_psrai_q_256:
6011 case Intrinsic::x86_avx512_psrai_q_128:
6012 case Intrinsic::x86_avx2_psll_w:
6013 case Intrinsic::x86_avx2_psll_d:
6014 case Intrinsic::x86_avx2_psll_q:
6015 case Intrinsic::x86_avx2_pslli_w:
6016 case Intrinsic::x86_avx2_pslli_d:
6017 case Intrinsic::x86_avx2_pslli_q:
6018 case Intrinsic::x86_avx2_psrl_w:
6019 case Intrinsic::x86_avx2_psrl_d:
6020 case Intrinsic::x86_avx2_psrl_q:
6021 case Intrinsic::x86_avx2_psra_w:
6022 case Intrinsic::x86_avx2_psra_d:
6023 case Intrinsic::x86_avx2_psrli_w:
6024 case Intrinsic::x86_avx2_psrli_d:
6025 case Intrinsic::x86_avx2_psrli_q:
6026 case Intrinsic::x86_avx2_psrai_w:
6027 case Intrinsic::x86_avx2_psrai_d:
6028 case Intrinsic::x86_sse2_psll_w:
6029 case Intrinsic::x86_sse2_psll_d:
6030 case Intrinsic::x86_sse2_psll_q:
6031 case Intrinsic::x86_sse2_pslli_w:
6032 case Intrinsic::x86_sse2_pslli_d:
6033 case Intrinsic::x86_sse2_pslli_q:
6034 case Intrinsic::x86_sse2_psrl_w:
6035 case Intrinsic::x86_sse2_psrl_d:
6036 case Intrinsic::x86_sse2_psrl_q:
6037 case Intrinsic::x86_sse2_psra_w:
6038 case Intrinsic::x86_sse2_psra_d:
6039 case Intrinsic::x86_sse2_psrli_w:
6040 case Intrinsic::x86_sse2_psrli_d:
6041 case Intrinsic::x86_sse2_psrli_q:
6042 case Intrinsic::x86_sse2_psrai_w:
6043 case Intrinsic::x86_sse2_psrai_d:
6044 case Intrinsic::x86_mmx_psll_w:
6045 case Intrinsic::x86_mmx_psll_d:
6046 case Intrinsic::x86_mmx_psll_q:
6047 case Intrinsic::x86_mmx_pslli_w:
6048 case Intrinsic::x86_mmx_pslli_d:
6049 case Intrinsic::x86_mmx_pslli_q:
6050 case Intrinsic::x86_mmx_psrl_w:
6051 case Intrinsic::x86_mmx_psrl_d:
6052 case Intrinsic::x86_mmx_psrl_q:
6053 case Intrinsic::x86_mmx_psra_w:
6054 case Intrinsic::x86_mmx_psra_d:
6055 case Intrinsic::x86_mmx_psrli_w:
6056 case Intrinsic::x86_mmx_psrli_d:
6057 case Intrinsic::x86_mmx_psrli_q:
6058 case Intrinsic::x86_mmx_psrai_w:
6059 case Intrinsic::x86_mmx_psrai_d:
6060 handleVectorShiftIntrinsic(I, /* Variable */ false);
6061 break;
6062 case Intrinsic::x86_avx2_psllv_d:
6063 case Intrinsic::x86_avx2_psllv_d_256:
6064 case Intrinsic::x86_avx512_psllv_d_512:
6065 case Intrinsic::x86_avx2_psllv_q:
6066 case Intrinsic::x86_avx2_psllv_q_256:
6067 case Intrinsic::x86_avx512_psllv_q_512:
6068 case Intrinsic::x86_avx2_psrlv_d:
6069 case Intrinsic::x86_avx2_psrlv_d_256:
6070 case Intrinsic::x86_avx512_psrlv_d_512:
6071 case Intrinsic::x86_avx2_psrlv_q:
6072 case Intrinsic::x86_avx2_psrlv_q_256:
6073 case Intrinsic::x86_avx512_psrlv_q_512:
6074 case Intrinsic::x86_avx2_psrav_d:
6075 case Intrinsic::x86_avx2_psrav_d_256:
6076 case Intrinsic::x86_avx512_psrav_d_512:
6077 case Intrinsic::x86_avx512_psrav_q_128:
6078 case Intrinsic::x86_avx512_psrav_q_256:
6079 case Intrinsic::x86_avx512_psrav_q_512:
6080 handleVectorShiftIntrinsic(I, /* Variable */ true);
6081 break;
6082
6083 // Pack with Signed/Unsigned Saturation
6084 case Intrinsic::x86_sse2_packsswb_128:
6085 case Intrinsic::x86_sse2_packssdw_128:
6086 case Intrinsic::x86_sse2_packuswb_128:
6087 case Intrinsic::x86_sse41_packusdw:
6088 case Intrinsic::x86_avx2_packsswb:
6089 case Intrinsic::x86_avx2_packssdw:
6090 case Intrinsic::x86_avx2_packuswb:
6091 case Intrinsic::x86_avx2_packusdw:
6092 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
6093 // (<32 x i16> %a, <32 x i16> %b)
6094 // <32 x i16> @llvm.x86.avx512.packssdw.512
6095 // (<16 x i32> %a, <16 x i32> %b)
6096 // Note: AVX512 masked variants are auto-upgraded by LLVM.
6097 case Intrinsic::x86_avx512_packsswb_512:
6098 case Intrinsic::x86_avx512_packssdw_512:
6099 case Intrinsic::x86_avx512_packuswb_512:
6100 case Intrinsic::x86_avx512_packusdw_512:
6101 handleVectorPackIntrinsic(I);
6102 break;
6103
6104 case Intrinsic::x86_sse41_pblendvb:
6105 case Intrinsic::x86_sse41_blendvpd:
6106 case Intrinsic::x86_sse41_blendvps:
6107 case Intrinsic::x86_avx_blendv_pd_256:
6108 case Intrinsic::x86_avx_blendv_ps_256:
6109 case Intrinsic::x86_avx2_pblendvb:
6110 handleBlendvIntrinsic(I);
6111 break;
6112
6113 case Intrinsic::x86_avx_dp_ps_256:
6114 case Intrinsic::x86_sse41_dppd:
6115 case Intrinsic::x86_sse41_dpps:
6116 handleDppIntrinsic(I);
6117 break;
6118
6119 case Intrinsic::x86_mmx_packsswb:
6120 case Intrinsic::x86_mmx_packuswb:
6121 handleVectorPackIntrinsic(I, 16);
6122 break;
6123
6124 case Intrinsic::x86_mmx_packssdw:
6125 handleVectorPackIntrinsic(I, 32);
6126 break;
6127
6128 case Intrinsic::x86_mmx_psad_bw:
6129 handleVectorSadIntrinsic(I, true);
6130 break;
6131 case Intrinsic::x86_sse2_psad_bw:
6132 case Intrinsic::x86_avx2_psad_bw:
6133 handleVectorSadIntrinsic(I);
6134 break;
6135
6136 // Multiply and Add Packed Words
6137 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
6138 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
6139 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
6140 //
6141 // Multiply and Add Packed Signed and Unsigned Bytes
6142 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
6143 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
6144 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
6145 //
6146 // These intrinsics are auto-upgraded into non-masked forms:
6147 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
6148 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
6149 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
6150 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
6151 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
6152 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
6153 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
6154 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
6155 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
6156 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
6157 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
6158 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
6159 case Intrinsic::x86_sse2_pmadd_wd:
6160 case Intrinsic::x86_avx2_pmadd_wd:
6161 case Intrinsic::x86_avx512_pmaddw_d_512:
6162 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
6163 case Intrinsic::x86_avx2_pmadd_ub_sw:
6164 case Intrinsic::x86_avx512_pmaddubs_w_512:
6165 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6166 /*ZeroPurifies=*/true,
6167 /*EltSizeInBits=*/0,
6168 /*Lanes=*/kBothLanes);
6169 break;
6170
6171 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
6172 case Intrinsic::x86_ssse3_pmadd_ub_sw:
6173 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6174 /*ZeroPurifies=*/true,
6175 /*EltSizeInBits=*/8,
6176 /*Lanes=*/kBothLanes);
6177 break;
6178
6179 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
6180 case Intrinsic::x86_mmx_pmadd_wd:
6181 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6182 /*ZeroPurifies=*/true,
6183 /*EltSizeInBits=*/16,
6184 /*Lanes=*/kBothLanes);
6185 break;
6186
6187 // BFloat16 multiply-add to single-precision
6188 // <4 x float> llvm.aarch64.neon.bfmlalt
6189 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6190 case Intrinsic::aarch64_neon_bfmlalt:
6191 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6192 /*ZeroPurifies=*/false,
6193 /*EltSizeInBits=*/0,
6194 /*Lanes=*/kOddLanes);
6195 break;
6196
6197 // <4 x float> llvm.aarch64.neon.bfmlalb
6198 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6199 case Intrinsic::aarch64_neon_bfmlalb:
6200 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6201 /*ZeroPurifies=*/false,
6202 /*EltSizeInBits=*/0,
6203 /*Lanes=*/kEvenLanes);
6204 break;
6205
6206 // AVX Vector Neural Network Instructions: bytes
6207 //
6208 // Multiply and Add Signed Bytes
6209 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
6210 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6211 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
6212 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6213 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
6214 // (<16 x i32>, <64 x i8>, <64 x i8>)
6215 //
6216 // Multiply and Add Signed Bytes With Saturation
6217 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
6218 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6219 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
6220 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6221 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
6222 // (<16 x i32>, <64 x i8>, <64 x i8>)
6223 //
6224 // Multiply and Add Signed and Unsigned Bytes
6225 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
6226 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6227 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
6228 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6229 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
6230 // (<16 x i32>, <64 x i8>, <64 x i8>)
6231 //
6232 // Multiply and Add Signed and Unsigned Bytes With Saturation
6233 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
6234 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6235 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
6236 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6237 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
6238 // (<16 x i32>, <64 x i8>, <64 x i8>)
6239 //
6240 // Multiply and Add Unsigned and Signed Bytes
6241 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
6242 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6243 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
6244 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6245 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
6246 // (<16 x i32>, <64 x i8>, <64 x i8>)
6247 //
6248 // Multiply and Add Unsigned and Signed Bytes With Saturation
6249 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
6250 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6251 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
6252 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6253 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
6254 // (<16 x i32>, <64 x i8>, <64 x i8>)
6255 //
6256 // Multiply and Add Unsigned Bytes
6257 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
6258 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6259 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
6260 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6261 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
6262 // (<16 x i32>, <64 x i8>, <64 x i8>)
6263 //
6264 // Multiply and Add Unsigned Bytes With Saturation
6265 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
6266 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6267 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
6268 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6269 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
6270 // (<16 x i32>, <64 x i8>, <64 x i8>)
6271 //
6272 // These intrinsics are auto-upgraded into non-masked forms:
6273 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
6274 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6275 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
6276 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6277 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
6278 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6279 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6280 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6281 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6282 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6283 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6284 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6285 //
6286 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6287 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6288 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6289 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6290 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6291 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6292 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6293 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6294 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6295 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6296 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6297 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6298 case Intrinsic::x86_avx512_vpdpbusd_128:
6299 case Intrinsic::x86_avx512_vpdpbusd_256:
6300 case Intrinsic::x86_avx512_vpdpbusd_512:
6301 case Intrinsic::x86_avx512_vpdpbusds_128:
6302 case Intrinsic::x86_avx512_vpdpbusds_256:
6303 case Intrinsic::x86_avx512_vpdpbusds_512:
6304 case Intrinsic::x86_avx2_vpdpbssd_128:
6305 case Intrinsic::x86_avx2_vpdpbssd_256:
6306 case Intrinsic::x86_avx10_vpdpbssd_512:
6307 case Intrinsic::x86_avx2_vpdpbssds_128:
6308 case Intrinsic::x86_avx2_vpdpbssds_256:
6309 case Intrinsic::x86_avx10_vpdpbssds_512:
6310 case Intrinsic::x86_avx2_vpdpbsud_128:
6311 case Intrinsic::x86_avx2_vpdpbsud_256:
6312 case Intrinsic::x86_avx10_vpdpbsud_512:
6313 case Intrinsic::x86_avx2_vpdpbsuds_128:
6314 case Intrinsic::x86_avx2_vpdpbsuds_256:
6315 case Intrinsic::x86_avx10_vpdpbsuds_512:
6316 case Intrinsic::x86_avx2_vpdpbuud_128:
6317 case Intrinsic::x86_avx2_vpdpbuud_256:
6318 case Intrinsic::x86_avx10_vpdpbuud_512:
6319 case Intrinsic::x86_avx2_vpdpbuuds_128:
6320 case Intrinsic::x86_avx2_vpdpbuuds_256:
6321 case Intrinsic::x86_avx10_vpdpbuuds_512:
6322 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6323 /*ZeroPurifies=*/true,
6324 /*EltSizeInBits=*/0,
6325 /*Lanes=*/kBothLanes);
6326 break;
6327
6328 // AVX Vector Neural Network Instructions: words
6329 //
6330 // Multiply and Add Signed Word Integers
6331 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6332 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6333 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6334 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6335 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6336 // (<16 x i32>, <32 x i16>, <32 x i16>)
6337 //
6338 // Multiply and Add Signed Word Integers With Saturation
6339 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6340 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6341 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6342 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6343 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6344 // (<16 x i32>, <32 x i16>, <32 x i16>)
6345 //
6346 // Multiply and Add Signed and Unsigned Word Integers
6347 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6348 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6349 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6350 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6351 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6352 // (<16 x i32>, <32 x i16>, <32 x i16>)
6353 //
6354 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6355 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6356 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6357 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6358 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6359 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6360 // (<16 x i32>, <32 x i16>, <32 x i16>)
6361 //
6362 // Multiply and Add Unsigned and Signed Word Integers
6363 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6364 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6365 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6366 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6367 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6368 // (<16 x i32>, <32 x i16>, <32 x i16>)
6369 //
6370 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6371 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6372 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6373 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6374 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6375 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6376 // (<16 x i32>, <32 x i16>, <32 x i16>)
6377 //
6378 // Multiply and Add Unsigned and Unsigned Word Integers
6379 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6380 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6381 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6382 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6383 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6384 // (<16 x i32>, <32 x i16>, <32 x i16>)
6385 //
6386 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6387 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6388 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6389 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6390 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6391 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6392 // (<16 x i32>, <32 x i16>, <32 x i16>)
6393 //
6394 // These intrinsics are auto-upgraded into non-masked forms:
6395 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6396 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6397 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6398 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6399 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6400 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6401 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6402 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6403 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6404 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6405 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6406 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6407 //
6408 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6409 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6410 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6411 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6412 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6413 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6414 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6415 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6416 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6417 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6418 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6419 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6420 case Intrinsic::x86_avx512_vpdpwssd_128:
6421 case Intrinsic::x86_avx512_vpdpwssd_256:
6422 case Intrinsic::x86_avx512_vpdpwssd_512:
6423 case Intrinsic::x86_avx512_vpdpwssds_128:
6424 case Intrinsic::x86_avx512_vpdpwssds_256:
6425 case Intrinsic::x86_avx512_vpdpwssds_512:
6426 case Intrinsic::x86_avx2_vpdpwsud_128:
6427 case Intrinsic::x86_avx2_vpdpwsud_256:
6428 case Intrinsic::x86_avx10_vpdpwsud_512:
6429 case Intrinsic::x86_avx2_vpdpwsuds_128:
6430 case Intrinsic::x86_avx2_vpdpwsuds_256:
6431 case Intrinsic::x86_avx10_vpdpwsuds_512:
6432 case Intrinsic::x86_avx2_vpdpwusd_128:
6433 case Intrinsic::x86_avx2_vpdpwusd_256:
6434 case Intrinsic::x86_avx10_vpdpwusd_512:
6435 case Intrinsic::x86_avx2_vpdpwusds_128:
6436 case Intrinsic::x86_avx2_vpdpwusds_256:
6437 case Intrinsic::x86_avx10_vpdpwusds_512:
6438 case Intrinsic::x86_avx2_vpdpwuud_128:
6439 case Intrinsic::x86_avx2_vpdpwuud_256:
6440 case Intrinsic::x86_avx10_vpdpwuud_512:
6441 case Intrinsic::x86_avx2_vpdpwuuds_128:
6442 case Intrinsic::x86_avx2_vpdpwuuds_256:
6443 case Intrinsic::x86_avx10_vpdpwuuds_512:
6444 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6445 /*ZeroPurifies=*/true,
6446 /*EltSizeInBits=*/0,
6447 /*Lanes=*/kBothLanes);
6448 break;
6449
6450 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6451 // Precision
6452 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6453 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6454 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6455 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6456 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6457 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6458 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6459 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6460 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6461 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6462 /*ZeroPurifies=*/false,
6463 /*EltSizeInBits=*/0,
6464 /*Lanes=*/kBothLanes);
6465 break;
6466
6467 case Intrinsic::x86_sse_cmp_ss:
6468 case Intrinsic::x86_sse2_cmp_sd:
6469 case Intrinsic::x86_sse_comieq_ss:
6470 case Intrinsic::x86_sse_comilt_ss:
6471 case Intrinsic::x86_sse_comile_ss:
6472 case Intrinsic::x86_sse_comigt_ss:
6473 case Intrinsic::x86_sse_comige_ss:
6474 case Intrinsic::x86_sse_comineq_ss:
6475 case Intrinsic::x86_sse_ucomieq_ss:
6476 case Intrinsic::x86_sse_ucomilt_ss:
6477 case Intrinsic::x86_sse_ucomile_ss:
6478 case Intrinsic::x86_sse_ucomigt_ss:
6479 case Intrinsic::x86_sse_ucomige_ss:
6480 case Intrinsic::x86_sse_ucomineq_ss:
6481 case Intrinsic::x86_sse2_comieq_sd:
6482 case Intrinsic::x86_sse2_comilt_sd:
6483 case Intrinsic::x86_sse2_comile_sd:
6484 case Intrinsic::x86_sse2_comigt_sd:
6485 case Intrinsic::x86_sse2_comige_sd:
6486 case Intrinsic::x86_sse2_comineq_sd:
6487 case Intrinsic::x86_sse2_ucomieq_sd:
6488 case Intrinsic::x86_sse2_ucomilt_sd:
6489 case Intrinsic::x86_sse2_ucomile_sd:
6490 case Intrinsic::x86_sse2_ucomigt_sd:
6491 case Intrinsic::x86_sse2_ucomige_sd:
6492 case Intrinsic::x86_sse2_ucomineq_sd:
6493 handleVectorCompareScalarIntrinsic(I);
6494 break;
6495
6496 case Intrinsic::x86_avx_cmp_pd_256:
6497 case Intrinsic::x86_avx_cmp_ps_256:
6498 case Intrinsic::x86_sse2_cmp_pd:
6499 case Intrinsic::x86_sse_cmp_ps:
6500 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/true);
6501 break;
6502
6503 case Intrinsic::x86_bmi_bextr_32:
6504 case Intrinsic::x86_bmi_bextr_64:
6505 case Intrinsic::x86_bmi_bzhi_32:
6506 case Intrinsic::x86_bmi_bzhi_64:
6507 case Intrinsic::x86_bmi_pdep_32:
6508 case Intrinsic::x86_bmi_pdep_64:
6509 case Intrinsic::x86_bmi_pext_32:
6510 case Intrinsic::x86_bmi_pext_64:
6511 handleBmiIntrinsic(I);
6512 break;
6513
6514 case Intrinsic::x86_pclmulqdq:
6515 case Intrinsic::x86_pclmulqdq_256:
6516 case Intrinsic::x86_pclmulqdq_512:
6517 handlePclmulIntrinsic(I);
6518 break;
6519
6520 case Intrinsic::x86_avx_round_pd_256:
6521 case Intrinsic::x86_avx_round_ps_256:
6522 case Intrinsic::x86_sse41_round_pd:
6523 case Intrinsic::x86_sse41_round_ps:
6524 handleRoundPdPsIntrinsic(I);
6525 break;
6526
6527 case Intrinsic::x86_sse41_round_sd:
6528 case Intrinsic::x86_sse41_round_ss:
6529 handleUnarySdSsIntrinsic(I);
6530 break;
6531
6532 case Intrinsic::x86_sse2_max_sd:
6533 case Intrinsic::x86_sse_max_ss:
6534 case Intrinsic::x86_sse2_min_sd:
6535 case Intrinsic::x86_sse_min_ss:
6536 handleBinarySdSsIntrinsic(I);
6537 break;
6538
6539 case Intrinsic::x86_avx_vtestc_pd:
6540 case Intrinsic::x86_avx_vtestc_pd_256:
6541 case Intrinsic::x86_avx_vtestc_ps:
6542 case Intrinsic::x86_avx_vtestc_ps_256:
6543 case Intrinsic::x86_avx_vtestnzc_pd:
6544 case Intrinsic::x86_avx_vtestnzc_pd_256:
6545 case Intrinsic::x86_avx_vtestnzc_ps:
6546 case Intrinsic::x86_avx_vtestnzc_ps_256:
6547 case Intrinsic::x86_avx_vtestz_pd:
6548 case Intrinsic::x86_avx_vtestz_pd_256:
6549 case Intrinsic::x86_avx_vtestz_ps:
6550 case Intrinsic::x86_avx_vtestz_ps_256:
6551 case Intrinsic::x86_avx_ptestc_256:
6552 case Intrinsic::x86_avx_ptestnzc_256:
6553 case Intrinsic::x86_avx_ptestz_256:
6554 case Intrinsic::x86_sse41_ptestc:
6555 case Intrinsic::x86_sse41_ptestnzc:
6556 case Intrinsic::x86_sse41_ptestz:
6557 handleVtestIntrinsic(I);
6558 break;
6559
6560 // Packed Horizontal Add/Subtract
6561 case Intrinsic::x86_ssse3_phadd_w:
6562 case Intrinsic::x86_ssse3_phadd_w_128:
6563 case Intrinsic::x86_ssse3_phsub_w:
6564 case Intrinsic::x86_ssse3_phsub_w_128:
6565 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6566 /*ReinterpretElemWidth=*/16);
6567 break;
6568
6569 case Intrinsic::x86_avx2_phadd_w:
6570 case Intrinsic::x86_avx2_phsub_w:
6571 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6572 /*ReinterpretElemWidth=*/16);
6573 break;
6574
6575 // Packed Horizontal Add/Subtract
6576 case Intrinsic::x86_ssse3_phadd_d:
6577 case Intrinsic::x86_ssse3_phadd_d_128:
6578 case Intrinsic::x86_ssse3_phsub_d:
6579 case Intrinsic::x86_ssse3_phsub_d_128:
6580 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6581 /*ReinterpretElemWidth=*/32);
6582 break;
6583
6584 case Intrinsic::x86_avx2_phadd_d:
6585 case Intrinsic::x86_avx2_phsub_d:
6586 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6587 /*ReinterpretElemWidth=*/32);
6588 break;
6589
6590 // Packed Horizontal Add/Subtract and Saturate
6591 case Intrinsic::x86_ssse3_phadd_sw:
6592 case Intrinsic::x86_ssse3_phadd_sw_128:
6593 case Intrinsic::x86_ssse3_phsub_sw:
6594 case Intrinsic::x86_ssse3_phsub_sw_128:
6595 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6596 /*ReinterpretElemWidth=*/16);
6597 break;
6598
6599 case Intrinsic::x86_avx2_phadd_sw:
6600 case Intrinsic::x86_avx2_phsub_sw:
6601 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6602 /*ReinterpretElemWidth=*/16);
6603 break;
6604
6605 // Packed Single/Double Precision Floating-Point Horizontal Add
6606 case Intrinsic::x86_sse3_hadd_ps:
6607 case Intrinsic::x86_sse3_hadd_pd:
6608 case Intrinsic::x86_sse3_hsub_ps:
6609 case Intrinsic::x86_sse3_hsub_pd:
6610 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6611 break;
6612
6613 case Intrinsic::x86_avx_hadd_pd_256:
6614 case Intrinsic::x86_avx_hadd_ps_256:
6615 case Intrinsic::x86_avx_hsub_pd_256:
6616 case Intrinsic::x86_avx_hsub_ps_256:
6617 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6618 break;
6619
6620 case Intrinsic::x86_avx_maskstore_ps:
6621 case Intrinsic::x86_avx_maskstore_pd:
6622 case Intrinsic::x86_avx_maskstore_ps_256:
6623 case Intrinsic::x86_avx_maskstore_pd_256:
6624 case Intrinsic::x86_avx2_maskstore_d:
6625 case Intrinsic::x86_avx2_maskstore_q:
6626 case Intrinsic::x86_avx2_maskstore_d_256:
6627 case Intrinsic::x86_avx2_maskstore_q_256: {
6628 handleAVXMaskedStore(I);
6629 break;
6630 }
6631
6632 case Intrinsic::x86_avx_maskload_ps:
6633 case Intrinsic::x86_avx_maskload_pd:
6634 case Intrinsic::x86_avx_maskload_ps_256:
6635 case Intrinsic::x86_avx_maskload_pd_256:
6636 case Intrinsic::x86_avx2_maskload_d:
6637 case Intrinsic::x86_avx2_maskload_q:
6638 case Intrinsic::x86_avx2_maskload_d_256:
6639 case Intrinsic::x86_avx2_maskload_q_256: {
6640 handleAVXMaskedLoad(I);
6641 break;
6642 }
6643
6644 // Packed
6645 case Intrinsic::x86_avx512fp16_add_ph_512:
6646 case Intrinsic::x86_avx512fp16_sub_ph_512:
6647 case Intrinsic::x86_avx512fp16_mul_ph_512:
6648 case Intrinsic::x86_avx512fp16_div_ph_512:
6649 case Intrinsic::x86_avx512fp16_max_ph_512:
6650 case Intrinsic::x86_avx512fp16_min_ph_512:
6651 case Intrinsic::x86_avx512_min_ps_512:
6652 case Intrinsic::x86_avx512_min_pd_512:
6653 case Intrinsic::x86_avx512_max_ps_512:
6654 case Intrinsic::x86_avx512_max_pd_512: {
6655 // These AVX512 variants contain the rounding mode as a trailing flag.
6656 // Earlier variants do not have a trailing flag and are already handled
6657 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6658 // maybeHandleUnknownIntrinsic.
6659 [[maybe_unused]] bool Success =
6660 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6661 assert(Success);
6662 break;
6663 }
6664
6665 case Intrinsic::x86_avx_vpermilvar_pd:
6666 case Intrinsic::x86_avx_vpermilvar_pd_256:
6667 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6668 case Intrinsic::x86_avx_vpermilvar_ps:
6669 case Intrinsic::x86_avx_vpermilvar_ps_256:
6670 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6671 handleAVXVpermilvar(I);
6672 break;
6673 }
6674
6675 case Intrinsic::x86_avx512_vpermi2var_d_128:
6676 case Intrinsic::x86_avx512_vpermi2var_d_256:
6677 case Intrinsic::x86_avx512_vpermi2var_d_512:
6678 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6679 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6680 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6681 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6682 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6683 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6684 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6685 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6686 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6687 case Intrinsic::x86_avx512_vpermi2var_q_128:
6688 case Intrinsic::x86_avx512_vpermi2var_q_256:
6689 case Intrinsic::x86_avx512_vpermi2var_q_512:
6690 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6691 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6692 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6693 handleAVXVpermi2var(I);
6694 break;
6695
6696 // Packed Shuffle
6697 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6698 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6699 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6700 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6701 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6702 //
6703 // The following intrinsics are auto-upgraded:
6704 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6705 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6706 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6707 case Intrinsic::x86_avx2_pshuf_b:
6708 case Intrinsic::x86_sse_pshuf_w:
6709 case Intrinsic::x86_ssse3_pshuf_b_128:
6710 case Intrinsic::x86_ssse3_pshuf_b:
6711 case Intrinsic::x86_avx512_pshuf_b_512:
6712 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6713 /*trailingVerbatimArgs=*/1);
6714 break;
6715
6716 // AVX512 PMOV: Packed MOV, with truncation
6717 // Precisely handled by applying the same intrinsic to the shadow
6718 case Intrinsic::x86_avx512_mask_pmov_dw_128:
6719 case Intrinsic::x86_avx512_mask_pmov_db_128:
6720 case Intrinsic::x86_avx512_mask_pmov_qb_128:
6721 case Intrinsic::x86_avx512_mask_pmov_qw_128:
6722 case Intrinsic::x86_avx512_mask_pmov_qd_128:
6723 case Intrinsic::x86_avx512_mask_pmov_wb_128:
6724 case Intrinsic::x86_avx512_mask_pmov_dw_256:
6725 case Intrinsic::x86_avx512_mask_pmov_db_256:
6726 case Intrinsic::x86_avx512_mask_pmov_qb_256:
6727 case Intrinsic::x86_avx512_mask_pmov_qw_256:
6728 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6729 case Intrinsic::x86_avx512_mask_pmov_db_512:
6730 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6731 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6732 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_{256,512} were removed in
6733 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6734 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6735 /*trailingVerbatimArgs=*/1);
6736 break;
6737 }
6738
6739 // AVX512 PMOV{S,US}: Packed MOV, with signed/unsigned saturation
6740 // Approximately handled using the corresponding truncation intrinsic
6741 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6742 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6743 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6744 handleIntrinsicByApplyingToShadow(I,
6745 Intrinsic::x86_avx512_mask_pmov_dw_512,
6746 /*trailingVerbatimArgs=*/1);
6747 break;
6748 }
6749
6750 case Intrinsic::x86_avx512_mask_pmovs_dw_256:
6751 case Intrinsic::x86_avx512_mask_pmovus_dw_256:
6752 handleIntrinsicByApplyingToShadow(I,
6753 Intrinsic::x86_avx512_mask_pmov_dw_256,
6754 /*trailingVerbatimArgs=*/1);
6755 break;
6756
6757 case Intrinsic::x86_avx512_mask_pmovs_dw_128:
6758 case Intrinsic::x86_avx512_mask_pmovus_dw_128:
6759 handleIntrinsicByApplyingToShadow(I,
6760 Intrinsic::x86_avx512_mask_pmov_dw_128,
6761 /*trailingVerbatimArgs=*/1);
6762 break;
6763
6764 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6765 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6766 handleIntrinsicByApplyingToShadow(I,
6767 Intrinsic::x86_avx512_mask_pmov_db_512,
6768 /*trailingVerbatimArgs=*/1);
6769 break;
6770 }
6771
6772 case Intrinsic::x86_avx512_mask_pmovs_db_256:
6773 case Intrinsic::x86_avx512_mask_pmovus_db_256:
6774 handleIntrinsicByApplyingToShadow(I,
6775 Intrinsic::x86_avx512_mask_pmov_db_256,
6776 /*trailingVerbatimArgs=*/1);
6777 break;
6778
6779 case Intrinsic::x86_avx512_mask_pmovs_db_128:
6780 case Intrinsic::x86_avx512_mask_pmovus_db_128:
6781 handleIntrinsicByApplyingToShadow(I,
6782 Intrinsic::x86_avx512_mask_pmov_db_128,
6783 /*trailingVerbatimArgs=*/1);
6784 break;
6785
6786 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6787 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6788 handleIntrinsicByApplyingToShadow(I,
6789 Intrinsic::x86_avx512_mask_pmov_qb_512,
6790 /*trailingVerbatimArgs=*/1);
6791 break;
6792 }
6793
6794 case Intrinsic::x86_avx512_mask_pmovs_qb_256:
6795 case Intrinsic::x86_avx512_mask_pmovus_qb_256:
6796 handleIntrinsicByApplyingToShadow(I,
6797 Intrinsic::x86_avx512_mask_pmov_qb_256,
6798 /*trailingVerbatimArgs=*/1);
6799 break;
6800
6801 case Intrinsic::x86_avx512_mask_pmovs_qb_128:
6802 case Intrinsic::x86_avx512_mask_pmovus_qb_128:
6803 handleIntrinsicByApplyingToShadow(I,
6804 Intrinsic::x86_avx512_mask_pmov_qb_128,
6805 /*trailingVerbatimArgs=*/1);
6806 break;
6807
6808 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6809 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6810 handleIntrinsicByApplyingToShadow(I,
6811 Intrinsic::x86_avx512_mask_pmov_qw_512,
6812 /*trailingVerbatimArgs=*/1);
6813 break;
6814 }
6815
6816 case Intrinsic::x86_avx512_mask_pmovs_qw_256:
6817 case Intrinsic::x86_avx512_mask_pmovus_qw_256:
6818 handleIntrinsicByApplyingToShadow(I,
6819 Intrinsic::x86_avx512_mask_pmov_qw_256,
6820 /*trailingVerbatimArgs=*/1);
6821 break;
6822
6823 case Intrinsic::x86_avx512_mask_pmovs_qw_128:
6824 case Intrinsic::x86_avx512_mask_pmovus_qw_128:
6825 handleIntrinsicByApplyingToShadow(I,
6826 Intrinsic::x86_avx512_mask_pmov_qw_128,
6827 /*trailingVerbatimArgs=*/1);
6828 break;
6829
6830 case Intrinsic::x86_avx512_mask_pmovs_qd_128:
6831 case Intrinsic::x86_avx512_mask_pmovus_qd_128:
6832 handleIntrinsicByApplyingToShadow(I,
6833 Intrinsic::x86_avx512_mask_pmov_qd_128,
6834 /*trailingVerbatimArgs=*/1);
6835 break;
6836
6837 case Intrinsic::x86_avx512_mask_pmovs_wb_128:
6838 case Intrinsic::x86_avx512_mask_pmovus_wb_128:
6839 handleIntrinsicByApplyingToShadow(I,
6840 Intrinsic::x86_avx512_mask_pmov_wb_128,
6841 /*trailingVerbatimArgs=*/1);
6842 break;
6843
6844 case Intrinsic::x86_avx512_mask_pmovs_qd_256:
6845 case Intrinsic::x86_avx512_mask_pmovus_qd_256:
6846 case Intrinsic::x86_avx512_mask_pmovs_wb_256:
6847 case Intrinsic::x86_avx512_mask_pmovus_wb_256:
6848 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6849 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6850 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6851 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6852 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_{256,512} do not exist,
6853 // we cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6854 // slow-path handler.
6855 handleAVX512VectorDownConvert(I);
6856 break;
6857 }
6858
6859 // AVX512/AVX10 Reciprocal
6860 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6861 // (<16 x float>, <16 x float>, i16)
6862 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6863 // (<8 x float>, <8 x float>, i8)
6864 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6865 // (<4 x float>, <4 x float>, i8)
6866 //
6867 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6868 // (<8 x double>, <8 x double>, i8)
6869 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6870 // (<4 x double>, <4 x double>, i8)
6871 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6872 // (<2 x double>, <2 x double>, i8)
6873 //
6874 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6875 // (<32 x bfloat>, <32 x bfloat>, i32)
6876 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6877 // (<16 x bfloat>, <16 x bfloat>, i16)
6878 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6879 // (<8 x bfloat>, <8 x bfloat>, i8)
6880 //
6881 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6882 // (<32 x half>, <32 x half>, i32)
6883 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6884 // (<16 x half>, <16 x half>, i16)
6885 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6886 // (<8 x half>, <8 x half>, i8)
6887 //
6888 // TODO: 3-operand variants are not handled:
6889 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6890 // (<2 x double>, <2 x double>, <2 x double>, i8)
6891 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6892 // (<4 x float>, <4 x float>, <4 x float>, i8)
6893 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6894 // (<8 x half>, <8 x half>, <8 x half>, i8)
6895 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6896 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6897 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6898 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6899 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6900 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6901 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6902 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6903 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6904 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6905 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6906 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6907 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6908 /*WriteThruIndex=*/1,
6909 /*MaskIndex=*/2);
6910 break;
6911
6912 // AVX512/AVX10 Reciprocal Square Root
6913 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6914 // (<16 x float>, <16 x float>, i16)
6915 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6916 // (<8 x float>, <8 x float>, i8)
6917 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6918 // (<4 x float>, <4 x float>, i8)
6919 //
6920 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6921 // (<8 x double>, <8 x double>, i8)
6922 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6923 // (<4 x double>, <4 x double>, i8)
6924 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6925 // (<2 x double>, <2 x double>, i8)
6926 //
6927 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6928 // (<32 x bfloat>, <32 x bfloat>, i32)
6929 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6930 // (<16 x bfloat>, <16 x bfloat>, i16)
6931 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6932 // (<8 x bfloat>, <8 x bfloat>, i8)
6933 //
6934 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6935 // (<32 x half>, <32 x half>, i32)
6936 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6937 // (<16 x half>, <16 x half>, i16)
6938 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6939 // (<8 x half>, <8 x half>, i8)
6940 //
6941 // TODO: 3-operand variants are not handled:
6942 // <2 x double> @llvm.x86.avx512.rcp14.sd
6943 // (<2 x double>, <2 x double>, <2 x double>, i8)
6944 // <4 x float> @llvm.x86.avx512.rcp14.ss
6945 // (<4 x float>, <4 x float>, <4 x float>, i8)
6946 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6947 // (<8 x half>, <8 x half>, <8 x half>, i8)
6948 case Intrinsic::x86_avx512_rcp14_ps_512:
6949 case Intrinsic::x86_avx512_rcp14_ps_256:
6950 case Intrinsic::x86_avx512_rcp14_ps_128:
6951 case Intrinsic::x86_avx512_rcp14_pd_512:
6952 case Intrinsic::x86_avx512_rcp14_pd_256:
6953 case Intrinsic::x86_avx512_rcp14_pd_128:
6954 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6955 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6956 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6957 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6958 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6959 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6960 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6961 /*WriteThruIndex=*/1,
6962 /*MaskIndex=*/2);
6963 break;
6964
6965 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6966 // (<32 x half>, i32, <32 x half>, i32, i32)
6967 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6968 // (<16 x half>, i32, <16 x half>, i32, i16)
6969 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6970 // (<8 x half>, i32, <8 x half>, i32, i8)
6971 //
6972 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6973 // (<16 x float>, i32, <16 x float>, i16, i32)
6974 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6975 // (<8 x float>, i32, <8 x float>, i8)
6976 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6977 // (<4 x float>, i32, <4 x float>, i8)
6978 //
6979 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6980 // (<8 x double>, i32, <8 x double>, i8, i32)
6981 // A Imm WriteThru Mask Rounding
6982 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6983 // (<4 x double>, i32, <4 x double>, i8)
6984 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6985 // (<2 x double>, i32, <2 x double>, i8)
6986 // A Imm WriteThru Mask
6987 //
6988 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6989 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6990 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6991 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6992 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6993 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6994 //
6995 // Not supported: three vectors
6996 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6997 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6998 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6999 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
7000 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
7001 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
7002 // i32)
7003 // A B WriteThru Mask Imm
7004 // Rounding
7005 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
7006 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
7007 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
7008 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
7009 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
7010 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
7011 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
7012 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
7013 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
7014 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
7015 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
7016 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
7017 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
7018 /*WriteThruIndex=*/2,
7019 /*MaskIndex=*/3);
7020 break;
7021
7022 // AVX512 Vector Scale Float* Packed
7023 //
7024 // < 8 x double> @llvm.x86.avx512.mask.scalef.pd.512
7025 // (<8 x double>, <8 x double>, <8 x double>, i8, i32)
7026 // A B WriteThru Msk Round
7027 // < 4 x double> @llvm.x86.avx512.mask.scalef.pd.256
7028 // (<4 x double>, <4 x double>, <4 x double>, i8)
7029 // < 2 x double> @llvm.x86.avx512.mask.scalef.pd.128
7030 // (<2 x double>, <2 x double>, <2 x double>, i8)
7031 //
7032 // <16 x float> @llvm.x86.avx512.mask.scalef.ps.512
7033 // (<16 x float>, <16 x float>, <16 x float>, i16, i32)
7034 // < 8 x float> @llvm.x86.avx512.mask.scalef.ps.256
7035 // (<8 x float>, <8 x float>, <8 x float>, i8)
7036 // < 4 x float> @llvm.x86.avx512.mask.scalef.ps.128
7037 // (<4 x float>, <4 x float>, <4 x float>, i8)
7038 //
7039 // <32 x half> @llvm.x86.avx512fp16.mask.scalef.ph.512
7040 // (<32 x half>, <32 x half>, <32 x half>, i32, i32)
7041 // <16 x half> @llvm.x86.avx512fp16.mask.scalef.ph.256
7042 // (<16 x half>, <16 x half>, <16 x half>, i16)
7043 // < 8 x half> @llvm.x86.avx512fp16.mask.scalef.ph.128
7044 // (<8 x half>, <8 x half>, <8 x half>, i8)
7045 //
7046 // TODO: AVX10
7047 // <32 x bfloat> @llvm.x86.avx10.mask.scalef.bf16.512
7048 // (<32 x bfloat>, <32 x bfloat>, <32 x bfloat>, i32)
7049 // <16 x bfloat> @llvm.x86.avx10.mask.scalef.bf16.256
7050 // (<16 x bfloat>, <16 x bfloat>, <16 x bfloat>, i16)
7051 // < 8 x bfloat> @llvm.x86.avx10.mask.scalef.bf16.128
7052 // (<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i8)
7053 case Intrinsic::x86_avx512_mask_scalef_pd_512:
7054 case Intrinsic::x86_avx512_mask_scalef_pd_256:
7055 case Intrinsic::x86_avx512_mask_scalef_pd_128:
7056 case Intrinsic::x86_avx512_mask_scalef_ps_512:
7057 case Intrinsic::x86_avx512_mask_scalef_ps_256:
7058 case Intrinsic::x86_avx512_mask_scalef_ps_128:
7059 case Intrinsic::x86_avx512fp16_mask_scalef_ph_512:
7060 case Intrinsic::x86_avx512fp16_mask_scalef_ph_256:
7061 case Intrinsic::x86_avx512fp16_mask_scalef_ph_128:
7062 // The AVX512 512-bit operand variants have an extra operand (the
7063 // Rounding mode). The extra operand, if present, will be
7064 // automatically checked by the handler.
7065 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0, 1},
7066 /*WriteThruIndex=*/2,
7067 /*MaskIndex=*/3);
7068 break;
7069
7070 // TODO: AVX512 Vector Scale Float* Scalar
7071 //
7072 // This is different from the Packed variant, because some bits are copied,
7073 // and some bits are zeroed.
7074 //
7075 // < 4 x float> @llvm.x86.avx512.mask.scalef.ss
7076 // (<4 x float>, <4 x float>, <4 x float>, i8, i32)
7077 //
7078 // < 2 x double> @llvm.x86.avx512.mask.scalef.sd
7079 // (<2 x double>, <2 x double>, <2 x double>, i8, i32)
7080 //
7081 // < 8 x half> @llvm.x86.avx512fp16.mask.scalef.sh
7082 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
7083
7084 // AVX512 FP16 Arithmetic
7085 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
7086 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
7087 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
7088 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
7089 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
7090 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
7091 visitGenericScalarHalfwordInst(I);
7092 break;
7093 }
7094
7095 // AVX Galois Field New Instructions
7096 case Intrinsic::x86_vgf2p8affineqb_128:
7097 case Intrinsic::x86_vgf2p8affineqb_256:
7098 case Intrinsic::x86_vgf2p8affineqb_512:
7099 handleAVXGF2P8Affine(I);
7100 break;
7101
7102 default:
7103 return false;
7104 }
7105
7106 return true;
7107 }
7108
7109 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
7110 switch (I.getIntrinsicID()) {
7111 // Two operands e.g.,
7112 // - <8 x i8> @llvm.aarch64.neon.rshrn.v8i8 (<8 x i16>, i32)
7113 // - <4 x i16> @llvm.aarch64.neon.uqrshl.v4i16(<4 x i16>, <4 x i16>)
7114 case Intrinsic::aarch64_neon_rshrn:
7115 case Intrinsic::aarch64_neon_sqrshl:
7116 case Intrinsic::aarch64_neon_sqrshrn:
7117 case Intrinsic::aarch64_neon_sqrshrun:
7118 case Intrinsic::aarch64_neon_sqshl:
7119 case Intrinsic::aarch64_neon_sqshlu:
7120 case Intrinsic::aarch64_neon_sqshrn:
7121 case Intrinsic::aarch64_neon_sqshrun:
7122 case Intrinsic::aarch64_neon_srshl:
7123 case Intrinsic::aarch64_neon_sshl:
7124 case Intrinsic::aarch64_neon_uqrshl:
7125 case Intrinsic::aarch64_neon_uqrshrn:
7126 case Intrinsic::aarch64_neon_uqshl:
7127 case Intrinsic::aarch64_neon_uqshrn:
7128 case Intrinsic::aarch64_neon_urshl:
7129 case Intrinsic::aarch64_neon_ushl:
7130 handleVectorShiftIntrinsic(I, /* Variable */ false);
7131 break;
7132
7133 // Vector Shift Left/Right and Insert
7134 //
7135 // Three operands e.g.,
7136 // - <4 x i16> @llvm.aarch64.neon.vsli.v4i16
7137 // (<4 x i16> %a, <4 x i16> %b, i32 %n)
7138 // - <16 x i8> @llvm.aarch64.neon.vsri.v16i8
7139 // (<16 x i8> %a, <16 x i8> %b, i32 %n)
7140 //
7141 // %b is shifted by %n bits, and the "missing" bits are filled in with %a
7142 // (instead of zero-extending/sign-extending).
7143 case Intrinsic::aarch64_neon_vsli:
7144 case Intrinsic::aarch64_neon_vsri:
7145 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
7146 /*trailingVerbatimArgs=*/1);
7147 break;
7148
7149 // TODO: handling max/min similarly to AND/OR may be more precise
7150 // Floating-Point Maximum/Minimum Pairwise
7151 case Intrinsic::aarch64_neon_fmaxp:
7152 case Intrinsic::aarch64_neon_fminp:
7153 // Floating-Point Maximum/Minimum Number Pairwise
7154 case Intrinsic::aarch64_neon_fmaxnmp:
7155 case Intrinsic::aarch64_neon_fminnmp:
7156 // Signed/Unsigned Maximum/Minimum Pairwise
7157 case Intrinsic::aarch64_neon_smaxp:
7158 case Intrinsic::aarch64_neon_sminp:
7159 case Intrinsic::aarch64_neon_umaxp:
7160 case Intrinsic::aarch64_neon_uminp:
7161 // Add Pairwise
7162 case Intrinsic::aarch64_neon_addp:
7163 // Floating-point Add Pairwise
7164 case Intrinsic::aarch64_neon_faddp:
7165 // Add Long Pairwise
7166 case Intrinsic::aarch64_neon_saddlp:
7167 case Intrinsic::aarch64_neon_uaddlp: {
7168 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
7169 break;
7170 }
7171
7172 // Floating-point Convert to integer, rounding to nearest with ties to Away
7173 case Intrinsic::aarch64_neon_fcvtas:
7174 case Intrinsic::aarch64_neon_fcvtau:
7175 // Floating-point convert to integer, rounding toward minus infinity
7176 case Intrinsic::aarch64_neon_fcvtms:
7177 case Intrinsic::aarch64_neon_fcvtmu:
7178 // Floating-point convert to integer, rounding to nearest with ties to even
7179 case Intrinsic::aarch64_neon_fcvtns:
7180 case Intrinsic::aarch64_neon_fcvtnu:
7181 // Floating-point convert to integer, rounding toward plus infinity
7182 case Intrinsic::aarch64_neon_fcvtps:
7183 case Intrinsic::aarch64_neon_fcvtpu:
7184 // Floating-point Convert to integer, rounding toward Zero
7185 case Intrinsic::aarch64_neon_fcvtzs:
7186 case Intrinsic::aarch64_neon_fcvtzu:
7187 // Floating-point convert to lower precision narrow, rounding to odd
7188 case Intrinsic::aarch64_neon_fcvtxn:
7189 // Vector Conversions Between Half-Precision and Single-Precision
7190 case Intrinsic::aarch64_neon_vcvthf2fp:
7191 case Intrinsic::aarch64_neon_vcvtfp2hf:
7192 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/false);
7193 break;
7194
7195 // Vector Conversions Between Fixed-Point and Floating-Point
7196 case Intrinsic::aarch64_neon_vcvtfxs2fp:
7197 case Intrinsic::aarch64_neon_vcvtfp2fxs:
7198 case Intrinsic::aarch64_neon_vcvtfxu2fp:
7199 case Intrinsic::aarch64_neon_vcvtfp2fxu:
7200 handleGenericVectorConvertIntrinsic(I, /*FixedPoint=*/true);
7201 break;
7202
7203 // TODO: bfloat conversions
7204 // - bfloat @llvm.aarch64.neon.bfcvt(float)
7205 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
7206 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
7207
7208 // Add reduction to scalar
7209 case Intrinsic::aarch64_neon_faddv:
7210 case Intrinsic::aarch64_neon_saddv:
7211 case Intrinsic::aarch64_neon_uaddv:
7212 // Signed/Unsigned min/max (Vector)
7213 // TODO: handling similarly to AND/OR may be more precise.
7214 case Intrinsic::aarch64_neon_smaxv:
7215 case Intrinsic::aarch64_neon_sminv:
7216 case Intrinsic::aarch64_neon_umaxv:
7217 case Intrinsic::aarch64_neon_uminv:
7218 // Floating-point min/max (vector)
7219 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
7220 // but our shadow propagation is the same.
7221 case Intrinsic::aarch64_neon_fmaxv:
7222 case Intrinsic::aarch64_neon_fminv:
7223 case Intrinsic::aarch64_neon_fmaxnmv:
7224 case Intrinsic::aarch64_neon_fminnmv:
7225 // Sum long across vector
7226 case Intrinsic::aarch64_neon_saddlv:
7227 case Intrinsic::aarch64_neon_uaddlv:
7228 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
7229 break;
7230
7231 case Intrinsic::aarch64_neon_ld1x2:
7232 case Intrinsic::aarch64_neon_ld1x3:
7233 case Intrinsic::aarch64_neon_ld1x4:
7234 case Intrinsic::aarch64_neon_ld2:
7235 case Intrinsic::aarch64_neon_ld3:
7236 case Intrinsic::aarch64_neon_ld4:
7237 case Intrinsic::aarch64_neon_ld2r:
7238 case Intrinsic::aarch64_neon_ld3r:
7239 case Intrinsic::aarch64_neon_ld4r: {
7240 handleNEONVectorLoad(I, /*WithLane=*/false);
7241 break;
7242 }
7243
7244 case Intrinsic::aarch64_neon_ld2lane:
7245 case Intrinsic::aarch64_neon_ld3lane:
7246 case Intrinsic::aarch64_neon_ld4lane: {
7247 handleNEONVectorLoad(I, /*WithLane=*/true);
7248 break;
7249 }
7250
7251 // Saturating extract narrow
7252 case Intrinsic::aarch64_neon_sqxtn:
7253 case Intrinsic::aarch64_neon_sqxtun:
7254 case Intrinsic::aarch64_neon_uqxtn:
7255 // These only have one argument, but we (ab)use handleShadowOr because it
7256 // does work on single argument intrinsics and will typecast the shadow
7257 // (and update the origin).
7258 handleShadowOr(I);
7259 break;
7260
7261 case Intrinsic::aarch64_neon_st1x2:
7262 case Intrinsic::aarch64_neon_st1x3:
7263 case Intrinsic::aarch64_neon_st1x4:
7264 case Intrinsic::aarch64_neon_st2:
7265 case Intrinsic::aarch64_neon_st3:
7266 case Intrinsic::aarch64_neon_st4: {
7267 handleNEONVectorStoreIntrinsic(I, false);
7268 break;
7269 }
7270
7271 case Intrinsic::aarch64_neon_st2lane:
7272 case Intrinsic::aarch64_neon_st3lane:
7273 case Intrinsic::aarch64_neon_st4lane: {
7274 handleNEONVectorStoreIntrinsic(I, true);
7275 break;
7276 }
7277
7278 // Arm NEON vector table intrinsics have the source/table register(s) as
7279 // arguments, followed by the index register. They return the output.
7280 //
7281 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
7282 // original value unchanged in the destination register.'
7283 // Conveniently, zero denotes a clean shadow, which means out-of-range
7284 // indices for TBL will initialize the user data with zero and also clean
7285 // the shadow. (For TBX, neither the user data nor the shadow will be
7286 // updated, which is also correct.)
7287 case Intrinsic::aarch64_neon_tbl1:
7288 case Intrinsic::aarch64_neon_tbl2:
7289 case Intrinsic::aarch64_neon_tbl3:
7290 case Intrinsic::aarch64_neon_tbl4:
7291 case Intrinsic::aarch64_neon_tbx1:
7292 case Intrinsic::aarch64_neon_tbx2:
7293 case Intrinsic::aarch64_neon_tbx3:
7294 case Intrinsic::aarch64_neon_tbx4: {
7295 // The last trailing argument (index register) should be handled verbatim
7296 handleIntrinsicByApplyingToShadow(
7297 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
7298 /*trailingVerbatimArgs*/ 1);
7299 break;
7300 }
7301
7302 case Intrinsic::aarch64_neon_fmulx:
7303 case Intrinsic::aarch64_neon_pmul:
7304 case Intrinsic::aarch64_neon_pmull:
7305 case Intrinsic::aarch64_neon_smull:
7306 case Intrinsic::aarch64_neon_pmull64:
7307 case Intrinsic::aarch64_neon_umull: {
7308 handleNEONVectorMultiplyIntrinsic(I);
7309 break;
7310 }
7311
7312 case Intrinsic::aarch64_neon_smmla:
7313 case Intrinsic::aarch64_neon_ummla:
7314 case Intrinsic::aarch64_neon_usmmla:
7315 case Intrinsic::aarch64_neon_bfmmla:
7316 handleNEONMatrixMultiply(I);
7317 break;
7318
7319 // <2 x i32> @llvm.aarch64.neon.{u,s,us}dot.v2i32.v8i8
7320 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
7321 // <4 x i32> @llvm.aarch64.neon.{u,s,us}dot.v4i32.v16i8
7322 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
7323 case Intrinsic::aarch64_neon_sdot:
7324 case Intrinsic::aarch64_neon_udot:
7325 case Intrinsic::aarch64_neon_usdot:
7326 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
7327 /*ZeroPurifies=*/true,
7328 /*EltSizeInBits=*/0,
7329 /*Lanes=*/kBothLanes);
7330 break;
7331
7332 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
7333 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
7334 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
7335 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
7336 case Intrinsic::aarch64_neon_bfdot:
7337 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
7338 /*ZeroPurifies=*/false,
7339 /*EltSizeInBits=*/0,
7340 /*Lanes=*/kBothLanes);
7341 break;
7342
7343 // Floating-Point Absolute Compare Greater Than/Equal
7344 case Intrinsic::aarch64_neon_facge:
7345 case Intrinsic::aarch64_neon_facgt:
7346 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/false);
7347 break;
7348
7349 default:
7350 return false;
7351 }
7352
7353 return true;
7354 }
7355
7356 void visitIntrinsicInst(IntrinsicInst &I) {
7357 if (maybeHandleCrossPlatformIntrinsic(I))
7358 return;
7359
7360 if (maybeHandleX86SIMDIntrinsic(I))
7361 return;
7362
7363 if (maybeHandleArmSIMDIntrinsic(I))
7364 return;
7365
7366 if (maybeHandleUnknownIntrinsic(I))
7367 return;
7368
7369 visitInstruction(I);
7370 }
7371
7372 void visitLibAtomicLoad(CallBase &CB) {
7373 // Since we use getNextNode here, we can't have CB terminate the BB.
7374 assert(isa<CallInst>(CB));
7375
7376 IRBuilder<> IRB(&CB);
7377 Value *Size = CB.getArgOperand(0);
7378 Value *SrcPtr = CB.getArgOperand(1);
7379 Value *DstPtr = CB.getArgOperand(2);
7380 Value *Ordering = CB.getArgOperand(3);
7381 // Convert the call to have at least Acquire ordering to make sure
7382 // the shadow operations aren't reordered before it.
7383 Value *NewOrdering =
7384 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
7385 CB.setArgOperand(3, NewOrdering);
7386
7387 NextNodeIRBuilder NextIRB(&CB);
7388 Value *SrcShadowPtr, *SrcOriginPtr;
7389 std::tie(SrcShadowPtr, SrcOriginPtr) =
7390 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7391 /*isStore*/ false);
7392 Value *DstShadowPtr =
7393 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7394 /*isStore*/ true)
7395 .first;
7396
7397 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
7398 if (MS.TrackOrigins) {
7399 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
7401 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
7402 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
7403 }
7404 }
7405
7406 void visitLibAtomicStore(CallBase &CB) {
7407 IRBuilder<> IRB(&CB);
7408 Value *Size = CB.getArgOperand(0);
7409 Value *DstPtr = CB.getArgOperand(2);
7410 Value *Ordering = CB.getArgOperand(3);
7411 // Convert the call to have at least Release ordering to make sure
7412 // the shadow operations aren't reordered after it.
7413 Value *NewOrdering =
7414 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
7415 CB.setArgOperand(3, NewOrdering);
7416
7417 Value *DstShadowPtr =
7418 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
7419 /*isStore*/ true)
7420 .first;
7421
7422 // Atomic store always paints clean shadow/origin. See file header.
7423 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
7424 Align(1));
7425 }
7426
7427 void visitCallBase(CallBase &CB) {
7428 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
7429 if (CB.isInlineAsm()) {
7430 // For inline asm (either a call to asm function, or callbr instruction),
7431 // do the usual thing: check argument shadow and mark all outputs as
7432 // clean. Note that any side effects of the inline asm that are not
7433 // immediately visible in its constraints are not handled.
7435 visitAsmInstruction(CB);
7436 else
7437 visitInstruction(CB);
7438 return;
7439 }
7440 LibFunc LF;
7441 if (TLI->getLibFunc(CB, LF)) {
7442 // libatomic.a functions need to have special handling because there isn't
7443 // a good way to intercept them or compile the library with
7444 // instrumentation.
7445 switch (LF) {
7446 case LibFunc_atomic_load:
7447 if (!isa<CallInst>(CB)) {
7448 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
7449 "Ignoring!\n";
7450 break;
7451 }
7452 visitLibAtomicLoad(CB);
7453 return;
7454 case LibFunc_atomic_store:
7455 visitLibAtomicStore(CB);
7456 return;
7457 default:
7458 break;
7459 }
7460 }
7461
7462 if (auto *Call = dyn_cast<CallInst>(&CB)) {
7463 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7464
7465 // We are going to insert code that relies on the fact that the callee
7466 // will become a non-readonly function after it is instrumented by us. To
7467 // prevent this code from being optimized out, mark that function
7468 // non-readonly in advance.
7469 // TODO: We can likely do better than dropping memory() completely here.
7470 AttributeMask B;
7471 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
7472
7474 if (Function *Func = Call->getCalledFunction()) {
7475 Func->removeFnAttrs(B);
7476 }
7477
7479 }
7480 IRBuilder<> IRB(&CB);
7481 bool MayCheckCall = MS.EagerChecks;
7482 if (Function *Func = CB.getCalledFunction()) {
7483 // __sanitizer_unaligned_{load,store} functions may be called by users
7484 // and always expects shadows in the TLS. So don't check them.
7485 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
7486 }
7487
7488 unsigned ArgOffset = 0;
7489 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7490 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
7491 if (!A->getType()->isSized()) {
7492 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7493 continue;
7494 }
7495
7496 if (A->getType()->isScalableTy()) {
7497 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7498 // Handle as noundef, but don't reserve tls slots.
7499 insertCheckShadowOf(A, &CB);
7500 continue;
7501 }
7502
7503 unsigned Size = 0;
7504 const DataLayout &DL = F.getDataLayout();
7505
7506 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
7507 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
7508 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7509
7510 if (EagerCheck) {
7511 insertCheckShadowOf(A, &CB);
7512 Size = DL.getTypeAllocSize(A->getType());
7513 } else {
7514 [[maybe_unused]] Value *Store = nullptr;
7515 // Compute the Shadow for arg even if it is ByVal, because
7516 // in that case getShadow() will copy the actual arg shadow to
7517 // __msan_param_tls.
7518 Value *ArgShadow = getShadow(A);
7519 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7520 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7521 << " Shadow: " << *ArgShadow << "\n");
7522 if (ByVal) {
7523 // ByVal requires some special handling as it's too big for a single
7524 // load
7525 assert(A->getType()->isPointerTy() &&
7526 "ByVal argument is not a pointer!");
7527 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
7528 if (ArgOffset + Size > kParamTLSSize)
7529 break;
7530 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
7531 MaybeAlign Alignment = std::nullopt;
7532 if (ParamAlignment)
7533 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
7534 Value *AShadowPtr, *AOriginPtr;
7535 std::tie(AShadowPtr, AOriginPtr) =
7536 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
7537 /*isStore*/ false);
7538 if (!PropagateShadow) {
7539 Store = IRB.CreateMemSet(ArgShadowBase,
7541 Size, Alignment);
7542 } else {
7543 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
7544 Alignment, Size);
7545 if (MS.TrackOrigins) {
7546 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7547 // FIXME: OriginSize should be:
7548 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7549 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
7550 IRB.CreateMemCpy(
7551 ArgOriginBase,
7552 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
7553 AOriginPtr,
7554 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
7555 }
7556 }
7557 } else {
7558 // Any other parameters mean we need bit-grained tracking of uninit
7559 // data
7560 Size = DL.getTypeAllocSize(A->getType());
7561 if (ArgOffset + Size > kParamTLSSize)
7562 break;
7563 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
7565 Constant *Cst = dyn_cast<Constant>(ArgShadow);
7566 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7567 IRB.CreateStore(getOrigin(A),
7568 getOriginPtrForArgument(IRB, ArgOffset));
7569 }
7570 }
7571 assert(Store != nullptr);
7572 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7573 }
7574 assert(Size != 0);
7575 ArgOffset += alignTo(Size, kShadowTLSAlignment);
7576 }
7577 LLVM_DEBUG(dbgs() << " done with call args\n");
7578
7579 FunctionType *FT = CB.getFunctionType();
7580 if (FT->isVarArg()) {
7581 VAHelper->visitCallBase(CB, IRB);
7582 }
7583
7584 // Now, get the shadow for the RetVal.
7585 if (!CB.getType()->isSized())
7586 return;
7587 // Don't emit the epilogue for musttail call returns.
7588 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
7589 return;
7590
7591 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
7592 setShadow(&CB, getCleanShadow(&CB));
7593 setOrigin(&CB, getCleanOrigin());
7594 return;
7595 }
7596
7597 IRBuilder<> IRBBefore(&CB);
7598 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7599 Value *Base = getShadowPtrForRetval(IRBBefore);
7600 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
7602 BasicBlock::iterator NextInsn;
7603 if (isa<CallInst>(CB)) {
7604 NextInsn = ++CB.getIterator();
7605 assert(NextInsn != CB.getParent()->end());
7606 } else {
7607 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
7608 if (!NormalDest->getSinglePredecessor()) {
7609 // FIXME: this case is tricky, so we are just conservative here.
7610 // Perhaps we need to split the edge between this BB and NormalDest,
7611 // but a naive attempt to use SplitEdge leads to a crash.
7612 setShadow(&CB, getCleanShadow(&CB));
7613 setOrigin(&CB, getCleanOrigin());
7614 return;
7615 }
7616 // FIXME: NextInsn is likely in a basic block that has not been visited
7617 // yet. Anything inserted there will be instrumented by MSan later!
7618 NextInsn = NormalDest->getFirstInsertionPt();
7619 assert(NextInsn != NormalDest->end() &&
7620 "Could not find insertion point for retval shadow load");
7621 }
7622 IRBuilder<> IRBAfter(&*NextInsn);
7623 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7624 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7625 "_msret");
7626 setShadow(&CB, RetvalShadow);
7627 if (MS.TrackOrigins)
7628 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7629 }
7630
7631 bool isAMustTailRetVal(Value *RetVal) {
7632 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7633 RetVal = I->getOperand(0);
7634 }
7635 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7636 return I->isMustTailCall();
7637 }
7638 return false;
7639 }
7640
7641 void visitReturnInst(ReturnInst &I) {
7642 IRBuilder<> IRB(&I);
7643 Value *RetVal = I.getReturnValue();
7644 if (!RetVal)
7645 return;
7646 // Don't emit the epilogue for musttail call returns.
7647 if (isAMustTailRetVal(RetVal))
7648 return;
7649 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7650 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7651 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7652 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7653 // must always return fully initialized values. For now, we hardcode "main".
7654 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7655
7656 Value *Shadow = getShadow(RetVal);
7657 bool StoreOrigin = true;
7658 if (EagerCheck) {
7659 insertCheckShadowOf(RetVal, &I);
7660 Shadow = getCleanShadow(RetVal);
7661 StoreOrigin = false;
7662 }
7663
7664 // The caller may still expect information passed over TLS if we pass our
7665 // check
7666 if (StoreShadow) {
7667 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7668 if (MS.TrackOrigins && StoreOrigin)
7669 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7670 }
7671 }
7672
7673 void visitPHINode(PHINode &I) {
7674 IRBuilder<> IRB(&I);
7675 if (!PropagateShadow) {
7676 setShadow(&I, getCleanShadow(&I));
7677 setOrigin(&I, getCleanOrigin());
7678 return;
7679 }
7680
7681 ShadowPHINodes.push_back(&I);
7682 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7683 "_msphi_s"));
7684 if (MS.TrackOrigins)
7685 setOrigin(
7686 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7687 }
7688
7689 Value *getLocalVarIdptr(AllocaInst &I) {
7690 ConstantInt *IntConst =
7691 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7692 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7693 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7694 IntConst);
7695 }
7696
7697 Value *getLocalVarDescription(AllocaInst &I) {
7698 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7699 }
7700
7701 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7702 if (PoisonStack && ClPoisonStackWithCall) {
7703 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7704 } else {
7705 Value *ShadowBase, *OriginBase;
7706 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7707 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7708
7709 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7710 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7711 }
7712
7713 if (PoisonStack && MS.TrackOrigins) {
7714 Value *Idptr = getLocalVarIdptr(I);
7715 if (ClPrintStackNames) {
7716 Value *Descr = getLocalVarDescription(I);
7717 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7718 {&I, Len, Idptr, Descr});
7719 } else {
7720 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7721 }
7722 }
7723 }
7724
7725 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7726 Value *Descr = getLocalVarDescription(I);
7727 if (PoisonStack) {
7728 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7729 } else {
7730 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7731 }
7732 }
7733
7734 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7735 if (!InsPoint)
7736 InsPoint = &I;
7737 NextNodeIRBuilder IRB(InsPoint);
7738 Value *Len = IRB.CreateAllocationSize(MS.IntptrTy, &I);
7739
7740 if (MS.CompileKernel)
7741 poisonAllocaKmsan(I, IRB, Len);
7742 else
7743 poisonAllocaUserspace(I, IRB, Len);
7744 }
7745
7746 void visitAllocaInst(AllocaInst &I) {
7747 setShadow(&I, getCleanShadow(&I));
7748 setOrigin(&I, getCleanOrigin());
7749 // We'll get to this alloca later unless it's poisoned at the corresponding
7750 // llvm.lifetime.start.
7751 AllocaSet.insert(&I);
7752 }
7753
7754 void visitSelectInst(SelectInst &I) {
7755 // a = select b, c, d
7756 Value *B = I.getCondition();
7757 Value *C = I.getTrueValue();
7758 Value *D = I.getFalseValue();
7759
7760 handleSelectLikeInst(I, B, C, D);
7761 }
7762
7763 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7764 IRBuilder<> IRB(&I);
7765
7766 Value *Sb = getShadow(B);
7767 Value *Sc = getShadow(C);
7768 Value *Sd = getShadow(D);
7769
7770 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7771 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7772 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7773
7774 // Result shadow if condition shadow is 0.
7775 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7776 Value *Sa1;
7777 if (I.getType()->isAggregateType()) {
7778 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7779 // an extra "select". This results in much more compact IR.
7780 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7781 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7782 } else if (isScalableNonVectorType(I.getType())) {
7783 // This is intended to handle target("aarch64.svcount"), which can't be
7784 // handled in the else branch because of incompatibility with CreateXor
7785 // ("The supported LLVM operations on this type are limited to load,
7786 // store, phi, select and alloca instructions").
7787
7788 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7789 // branch as needed instead.
7790 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7791 } else {
7792 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7793 // If Sb (condition is poisoned), look for bits in c and d that are equal
7794 // and both unpoisoned.
7795 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7796
7797 // Cast arguments to shadow-compatible type.
7798 C = CreateAppToShadowCast(IRB, C);
7799 D = CreateAppToShadowCast(IRB, D);
7800
7801 // Result shadow if condition shadow is 1.
7802 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7803 }
7804 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7805 setShadow(&I, Sa);
7806 if (MS.TrackOrigins) {
7807 // Origins are always i32, so any vector conditions must be flattened.
7808 // FIXME: consider tracking vector origins for app vectors?
7809 if (B->getType()->isVectorTy()) {
7810 B = convertToBool(B, IRB);
7811 Sb = convertToBool(Sb, IRB);
7812 }
7813 // a = select b, c, d
7814 // Oa = Sb ? Ob : (b ? Oc : Od)
7815 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7816 }
7817 }
7818
7819 void visitLandingPadInst(LandingPadInst &I) {
7820 // Do nothing.
7821 // See https://github.com/google/sanitizers/issues/504
7822 setShadow(&I, getCleanShadow(&I));
7823 setOrigin(&I, getCleanOrigin());
7824 }
7825
7826 void visitCatchSwitchInst(CatchSwitchInst &I) {
7827 setShadow(&I, getCleanShadow(&I));
7828 setOrigin(&I, getCleanOrigin());
7829 }
7830
7831 void visitFuncletPadInst(FuncletPadInst &I) {
7832 setShadow(&I, getCleanShadow(&I));
7833 setOrigin(&I, getCleanOrigin());
7834 }
7835
7836 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7837
7838 void visitExtractValueInst(ExtractValueInst &I) {
7839 IRBuilder<> IRB(&I);
7840 Value *Agg = I.getAggregateOperand();
7841 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7842 Value *AggShadow = getShadow(Agg);
7843 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7844 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7845 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7846 setShadow(&I, ResShadow);
7847 setOriginForNaryOp(I);
7848 }
7849
7850 void visitInsertValueInst(InsertValueInst &I) {
7851 IRBuilder<> IRB(&I);
7852 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7853 Value *AggShadow = getShadow(I.getAggregateOperand());
7854 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7855 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7856 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7857 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7858 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7859 setShadow(&I, Res);
7860 setOriginForNaryOp(I);
7861 }
7862
7863 void dumpInst(Instruction &I, const Twine &Prefix) {
7864 // Instruction name only
7865 // For intrinsics, the full/overloaded name is used
7866 //
7867 // e.g., "call llvm.aarch64.neon.uqsub.v16i8"
7868 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7869 errs() << "ZZZ:" << Prefix << " call "
7870 << CI->getCalledFunction()->getName() << "\n";
7871 } else {
7872 errs() << "ZZZ:" << Prefix << " " << I.getOpcodeName() << "\n";
7873 }
7874
7875 // Instruction prototype (including return type and parameter types)
7876 // For intrinsics, we use the base/non-overloaded name
7877 //
7878 // e.g., "call <16 x i8> @llvm.aarch64.neon.uqsub(<16 x i8>, <16 x i8>)"
7879 unsigned NumOperands = I.getNumOperands();
7880 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7881 errs() << "YYY:" << Prefix << " call " << *I.getType() << " @";
7882
7883 if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI))
7884 errs() << Intrinsic::getBaseName(II->getIntrinsicID());
7885 else
7886 errs() << CI->getCalledFunction()->getName();
7887
7888 errs() << "(";
7889
7890 // The last operand of a CallInst is the function itself.
7891 NumOperands--;
7892 } else
7893 errs() << "YYY:" << Prefix << " " << *I.getType() << " "
7894 << I.getOpcodeName() << "(";
7895
7896 for (size_t i = 0; i < NumOperands; i++) {
7897 if (i > 0)
7898 errs() << ", ";
7899
7900 errs() << *(I.getOperand(i)->getType());
7901 }
7902
7903 errs() << ")\n";
7904
7905 // Full instruction, including types and operand values
7906 // For intrinsics, the full/overloaded name is used
7907 //
7908 // e.g., "%vqsubq_v.i15 = call noundef <16 x i8>
7909 // @llvm.aarch64.neon.uqsub.v16i8(<16 x i8> %vext21.i,
7910 // <16 x i8> splat (i8 1)), !dbg !66"
7911 errs() << "QQQ:" << Prefix << " " << I << "\n";
7912 }
7913
7914 void visitResumeInst(ResumeInst &I) {
7915 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7916 // Nothing to do here.
7917 }
7918
7919 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7920 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7921 // Nothing to do here.
7922 }
7923
7924 void visitCatchReturnInst(CatchReturnInst &CRI) {
7925 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7926 // Nothing to do here.
7927 }
7928
7929 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7930 IRBuilder<> &IRB, const DataLayout &DL,
7931 bool isOutput) {
7932 // For each assembly argument, we check its value for being initialized.
7933 // If the argument is a pointer, we assume it points to a single element
7934 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7935 // Each such pointer is instrumented with a call to the runtime library.
7936 Type *OpType = Operand->getType();
7937 // Check the operand value itself.
7938 insertCheckShadowOf(Operand, &I);
7939 if (!OpType->isPointerTy() || !isOutput) {
7940 assert(!isOutput);
7941 return;
7942 }
7943 if (!ElemTy->isSized())
7944 return;
7945 auto Size = DL.getTypeStoreSize(ElemTy);
7946 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7947 if (MS.CompileKernel) {
7948 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7949 } else {
7950 // ElemTy, derived from elementtype(), does not encode the alignment of
7951 // the pointer. Conservatively assume that the shadow memory is unaligned.
7952 // When Size is large, avoid StoreInst as it would expand to many
7953 // instructions.
7954 auto [ShadowPtr, _] =
7955 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7956 if (Size <= 32)
7957 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7958 else
7959 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7960 SizeVal, Align(1));
7961 }
7962 }
7963
7964 /// Get the number of output arguments returned by pointers.
7965 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7966 int NumRetOutputs = 0;
7967 int NumOutputs = 0;
7968 Type *RetTy = cast<Value>(CB)->getType();
7969 if (!RetTy->isVoidTy()) {
7970 // Register outputs are returned via the CallInst return value.
7971 auto *ST = dyn_cast<StructType>(RetTy);
7972 if (ST)
7973 NumRetOutputs = ST->getNumElements();
7974 else
7975 NumRetOutputs = 1;
7976 }
7977 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7978 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7979 switch (Info.Type) {
7981 NumOutputs++;
7982 break;
7983 default:
7984 break;
7985 }
7986 }
7987 return NumOutputs - NumRetOutputs;
7988 }
7989
7990 void visitAsmInstruction(Instruction &I) {
7991 // Conservative inline assembly handling: check for poisoned shadow of
7992 // asm() arguments, then unpoison the result and all the memory locations
7993 // pointed to by those arguments.
7994 // An inline asm() statement in C++ contains lists of input and output
7995 // arguments used by the assembly code. These are mapped to operands of the
7996 // CallInst as follows:
7997 // - nR register outputs ("=r) are returned by value in a single structure
7998 // (SSA value of the CallInst);
7999 // - nO other outputs ("=m" and others) are returned by pointer as first
8000 // nO operands of the CallInst;
8001 // - nI inputs ("r", "m" and others) are passed to CallInst as the
8002 // remaining nI operands.
8003 // The total number of asm() arguments in the source is nR+nO+nI, and the
8004 // corresponding CallInst has nO+nI+1 operands (the last operand is the
8005 // function to be called).
8006 const DataLayout &DL = F.getDataLayout();
8007 CallBase *CB = cast<CallBase>(&I);
8008 IRBuilder<> IRB(&I);
8009 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
8010 int OutputArgs = getNumOutputArgs(IA, CB);
8011 // The last operand of a CallInst is the function itself.
8012 int NumOperands = CB->getNumOperands() - 1;
8013
8014 // Check input arguments. Doing so before unpoisoning output arguments, so
8015 // that we won't overwrite uninit values before checking them.
8016 for (int i = OutputArgs; i < NumOperands; i++) {
8017 Value *Operand = CB->getOperand(i);
8018 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
8019 /*isOutput*/ false);
8020 }
8021 // Unpoison output arguments. This must happen before the actual InlineAsm
8022 // call, so that the shadow for memory published in the asm() statement
8023 // remains valid.
8024 for (int i = 0; i < OutputArgs; i++) {
8025 Value *Operand = CB->getOperand(i);
8026 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
8027 /*isOutput*/ true);
8028 }
8029
8030 setShadow(&I, getCleanShadow(&I));
8031 setOrigin(&I, getCleanOrigin());
8032 }
8033
8034 void visitFreezeInst(FreezeInst &I) {
8035 // Freeze always returns a fully defined value.
8036 setShadow(&I, getCleanShadow(&I));
8037 setOrigin(&I, getCleanOrigin());
8038 }
8039
8040 void visitInstruction(Instruction &I) {
8041 // Everything else: stop propagating and check for poisoned shadow.
8043 dumpInst(I, "Strict");
8044 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
8045 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
8046 Value *Operand = I.getOperand(i);
8047 if (Operand->getType()->isSized())
8048 insertCheckShadowOf(Operand, &I);
8049 }
8050 setShadow(&I, getCleanShadow(&I));
8051 setOrigin(&I, getCleanOrigin());
8052 }
8053};
8054
8055struct VarArgHelperBase : public VarArgHelper {
8056 Function &F;
8057 MemorySanitizer &MS;
8058 MemorySanitizerVisitor &MSV;
8059 SmallVector<CallInst *, 16> VAStartInstrumentationList;
8060 const unsigned VAListTagSize;
8061
8062 VarArgHelperBase(Function &F, MemorySanitizer &MS,
8063 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
8064 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
8065
8066 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
8067 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
8068 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
8069 }
8070
8071 /// Compute the shadow address for a given va_arg.
8072 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
8073 return IRB.CreatePtrAdd(
8074 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
8075 }
8076
8077 /// Compute the shadow address for a given va_arg.
8078 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
8079 unsigned ArgSize) {
8080 // Make sure we don't overflow __msan_va_arg_tls.
8081 if (ArgOffset + ArgSize > kParamTLSSize)
8082 return nullptr;
8083 return getShadowPtrForVAArgument(IRB, ArgOffset);
8084 }
8085
8086 /// Compute the origin address for a given va_arg.
8087 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
8088 // getOriginPtrForVAArgument() is always called after
8089 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
8090 // overflow.
8091 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
8092 ConstantInt::get(MS.IntptrTy, ArgOffset),
8093 "_msarg_va_o");
8094 }
8095
8096 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
8097 unsigned BaseOffset) {
8098 // The tails of __msan_va_arg_tls is not large enough to fit full
8099 // value shadow, but it will be copied to backup anyway. Make it
8100 // clean.
8101 if (BaseOffset >= kParamTLSSize)
8102 return;
8103 Value *TailSize =
8104 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
8105 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
8106 TailSize, Align(8));
8107 }
8108
8109 void unpoisonVAListTagForInst(IntrinsicInst &I) {
8110 IRBuilder<> IRB(&I);
8111 Value *VAListTag = I.getArgOperand(0);
8112 const Align Alignment = Align(8);
8113 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
8114 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
8115 // Unpoison the whole __va_list_tag.
8116 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
8117 VAListTagSize, Alignment, false);
8118 }
8119
8120 void visitVAStartInst(VAStartInst &I) override {
8121 if (F.getCallingConv() == CallingConv::Win64)
8122 return;
8123 VAStartInstrumentationList.push_back(&I);
8124 unpoisonVAListTagForInst(I);
8125 }
8126
8127 void visitVACopyInst(VACopyInst &I) override {
8128 if (F.getCallingConv() == CallingConv::Win64)
8129 return;
8130 unpoisonVAListTagForInst(I);
8131 }
8132};
8133
8134/// AMD64-specific implementation of VarArgHelper.
8135struct VarArgAMD64Helper : public VarArgHelperBase {
8136 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
8137 // See a comment in visitCallBase for more details.
8138 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
8139 static const unsigned AMD64FpEndOffsetSSE = 176;
8140 // If SSE is disabled, fp_offset in va_list is zero.
8141 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
8142
8143 unsigned AMD64FpEndOffset;
8144 AllocaInst *VAArgTLSCopy = nullptr;
8145 AllocaInst *VAArgTLSOriginCopy = nullptr;
8146 Value *VAArgOverflowSize = nullptr;
8147
8148 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
8149
8150 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
8151 MemorySanitizerVisitor &MSV)
8152 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
8153 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
8154 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
8155 if (Attr.isStringAttribute() &&
8156 (Attr.getKindAsString() == "target-features")) {
8157 if (Attr.getValueAsString().contains("-sse"))
8158 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
8159 break;
8160 }
8161 }
8162 }
8163
8164 ArgKind classifyArgument(Value *arg) {
8165 // A very rough approximation of X86_64 argument classification rules.
8166 Type *T = arg->getType();
8167 if (T->isX86_FP80Ty())
8168 return AK_Memory;
8169 if (T->isFPOrFPVectorTy())
8170 return AK_FloatingPoint;
8171 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
8172 return AK_GeneralPurpose;
8173 if (T->isPointerTy())
8174 return AK_GeneralPurpose;
8175 return AK_Memory;
8176 }
8177
8178 // For VarArg functions, store the argument shadow in an ABI-specific format
8179 // that corresponds to va_list layout.
8180 // We do this because Clang lowers va_arg in the frontend, and this pass
8181 // only sees the low level code that deals with va_list internals.
8182 // A much easier alternative (provided that Clang emits va_arg instructions)
8183 // would have been to associate each live instance of va_list with a copy of
8184 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
8185 // order.
8186 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8187 unsigned GpOffset = 0;
8188 unsigned FpOffset = AMD64GpEndOffset;
8189 unsigned OverflowOffset = AMD64FpEndOffset;
8190 const DataLayout &DL = F.getDataLayout();
8191
8192 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8193 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8194 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8195 if (IsByVal) {
8196 // ByVal arguments always go to the overflow area.
8197 // Fixed arguments passed through the overflow area will be stepped
8198 // over by va_start, so don't count them towards the offset.
8199 if (IsFixed)
8200 continue;
8201 assert(A->getType()->isPointerTy());
8202 Type *RealTy = CB.getParamByValType(ArgNo);
8203 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8204 uint64_t AlignedSize = alignTo(ArgSize, 8);
8205 unsigned BaseOffset = OverflowOffset;
8206 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
8207 Value *OriginBase = nullptr;
8208 if (MS.TrackOrigins)
8209 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
8210 OverflowOffset += AlignedSize;
8211
8212 if (OverflowOffset > kParamTLSSize) {
8213 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8214 continue; // We have no space to copy shadow there.
8215 }
8216
8217 Value *ShadowPtr, *OriginPtr;
8218 std::tie(ShadowPtr, OriginPtr) =
8219 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
8220 /*isStore*/ false);
8221 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
8222 kShadowTLSAlignment, ArgSize);
8223 if (MS.TrackOrigins)
8224 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
8225 kShadowTLSAlignment, ArgSize);
8226 } else {
8227 ArgKind AK = classifyArgument(A);
8228 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
8229 AK = AK_Memory;
8230 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
8231 AK = AK_Memory;
8232 Value *ShadowBase, *OriginBase = nullptr;
8233 switch (AK) {
8234 case AK_GeneralPurpose:
8235 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
8236 if (MS.TrackOrigins)
8237 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
8238 GpOffset += 8;
8239 assert(GpOffset <= kParamTLSSize);
8240 break;
8241 case AK_FloatingPoint:
8242 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
8243 if (MS.TrackOrigins)
8244 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8245 FpOffset += 16;
8246 assert(FpOffset <= kParamTLSSize);
8247 break;
8248 case AK_Memory:
8249 if (IsFixed)
8250 continue;
8251 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8252 uint64_t AlignedSize = alignTo(ArgSize, 8);
8253 unsigned BaseOffset = OverflowOffset;
8254 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
8255 if (MS.TrackOrigins) {
8256 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
8257 }
8258 OverflowOffset += AlignedSize;
8259 if (OverflowOffset > kParamTLSSize) {
8260 // We have no space to copy shadow there.
8261 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8262 continue;
8263 }
8264 }
8265 // Take fixed arguments into account for GpOffset and FpOffset,
8266 // but don't actually store shadows for them.
8267 // TODO(glider): don't call get*PtrForVAArgument() for them.
8268 if (IsFixed)
8269 continue;
8270 Value *Shadow = MSV.getShadow(A);
8271 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
8272 if (MS.TrackOrigins) {
8273 Value *Origin = MSV.getOrigin(A);
8274 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8275 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8277 }
8278 }
8279 }
8280 Constant *OverflowSize =
8281 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
8282 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8283 }
8284
8285 void finalizeInstrumentation() override {
8286 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8287 "finalizeInstrumentation called twice");
8288 if (!VAStartInstrumentationList.empty()) {
8289 // If there is a va_start in this function, make a backup copy of
8290 // va_arg_tls somewhere in the function entry block.
8291 IRBuilder<> IRB(MSV.FnPrologueEnd);
8292 VAArgOverflowSize =
8293 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8294 Value *CopySize = IRB.CreateAdd(
8295 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
8296 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8297 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8298 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8299 CopySize, kShadowTLSAlignment, false);
8300
8301 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8302 Intrinsic::umin, CopySize,
8303 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8304 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8305 kShadowTLSAlignment, SrcSize);
8306 if (MS.TrackOrigins) {
8307 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8308 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8309 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8310 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8311 }
8312 }
8313
8314 // Instrument va_start.
8315 // Copy va_list shadow from the backup copy of the TLS contents.
8316 for (CallInst *OrigInst : VAStartInstrumentationList) {
8317 NextNodeIRBuilder IRB(OrigInst);
8318 Value *VAListTag = OrigInst->getArgOperand(0);
8319
8320 Value *RegSaveAreaPtrPtr =
8321 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
8322 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8323 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8324 const Align Alignment = Align(16);
8325 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8326 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8327 Alignment, /*isStore*/ true);
8328 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8329 AMD64FpEndOffset);
8330 if (MS.TrackOrigins)
8331 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8332 Alignment, AMD64FpEndOffset);
8333 Value *OverflowArgAreaPtrPtr =
8334 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
8335 Value *OverflowArgAreaPtr =
8336 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8337 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8338 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8339 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8340 Alignment, /*isStore*/ true);
8341 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8342 AMD64FpEndOffset);
8343 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8344 VAArgOverflowSize);
8345 if (MS.TrackOrigins) {
8346 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8347 AMD64FpEndOffset);
8348 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8349 VAArgOverflowSize);
8350 }
8351 }
8352 }
8353};
8354
8355/// AArch64-specific implementation of VarArgHelper.
8356struct VarArgAArch64Helper : public VarArgHelperBase {
8357 static const unsigned kAArch64GrArgSize = 64;
8358 static const unsigned kAArch64VrArgSize = 128;
8359
8360 static const unsigned AArch64GrBegOffset = 0;
8361 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
8362 // Make VR space aligned to 16 bytes.
8363 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
8364 static const unsigned AArch64VrEndOffset =
8365 AArch64VrBegOffset + kAArch64VrArgSize;
8366 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
8367
8368 AllocaInst *VAArgTLSCopy = nullptr;
8369 Value *VAArgOverflowSize = nullptr;
8370
8371 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
8372
8373 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
8374 MemorySanitizerVisitor &MSV)
8375 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
8376
8377 // A very rough approximation of aarch64 argument classification rules.
8378 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
8379 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
8380 return {AK_GeneralPurpose, 1};
8381 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
8382 return {AK_FloatingPoint, 1};
8383
8384 if (T->isArrayTy()) {
8385 auto R = classifyArgument(T->getArrayElementType());
8386 R.second *= T->getScalarType()->getArrayNumElements();
8387 return R;
8388 }
8389
8390 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
8391 auto R = classifyArgument(FV->getScalarType());
8392 R.second *= FV->getNumElements();
8393 return R;
8394 }
8395
8396 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
8397 return {AK_Memory, 0};
8398 }
8399
8400 // The instrumentation stores the argument shadow in a non ABI-specific
8401 // format because it does not know which argument is named (since Clang,
8402 // like x86_64 case, lowers the va_args in the frontend and this pass only
8403 // sees the low level code that deals with va_list internals).
8404 // The first seven GR registers are saved in the first 56 bytes of the
8405 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
8406 // the remaining arguments.
8407 // Using constant offset within the va_arg TLS array allows fast copy
8408 // in the finalize instrumentation.
8409 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8410 unsigned GrOffset = AArch64GrBegOffset;
8411 unsigned VrOffset = AArch64VrBegOffset;
8412 unsigned OverflowOffset = AArch64VAEndOffset;
8413
8414 const DataLayout &DL = F.getDataLayout();
8415 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8416 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8417 auto [AK, RegNum] = classifyArgument(A->getType());
8418 if (AK == AK_GeneralPurpose &&
8419 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
8420 AK = AK_Memory;
8421 if (AK == AK_FloatingPoint &&
8422 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
8423 AK = AK_Memory;
8424 Value *Base;
8425 switch (AK) {
8426 case AK_GeneralPurpose:
8427 Base = getShadowPtrForVAArgument(IRB, GrOffset);
8428 GrOffset += 8 * RegNum;
8429 break;
8430 case AK_FloatingPoint:
8431 Base = getShadowPtrForVAArgument(IRB, VrOffset);
8432 VrOffset += 16 * RegNum;
8433 break;
8434 case AK_Memory:
8435 // Don't count fixed arguments in the overflow area - va_start will
8436 // skip right over them.
8437 if (IsFixed)
8438 continue;
8439 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8440 uint64_t AlignedSize = alignTo(ArgSize, 8);
8441 unsigned BaseOffset = OverflowOffset;
8442 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
8443 OverflowOffset += AlignedSize;
8444 if (OverflowOffset > kParamTLSSize) {
8445 // We have no space to copy shadow there.
8446 CleanUnusedTLS(IRB, Base, BaseOffset);
8447 continue;
8448 }
8449 break;
8450 }
8451 // Count Gp/Vr fixed arguments to their respective offsets, but don't
8452 // bother to actually store a shadow.
8453 if (IsFixed)
8454 continue;
8455 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8456 }
8457 Constant *OverflowSize =
8458 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
8459 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8460 }
8461
8462 // Retrieve a va_list field of 'void*' size.
8463 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8464 Value *SaveAreaPtrPtr =
8465 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8466 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
8467 }
8468
8469 // Retrieve a va_list field of 'int' size.
8470 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8471 Value *SaveAreaPtr =
8472 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8473 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
8474 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
8475 }
8476
8477 void finalizeInstrumentation() override {
8478 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8479 "finalizeInstrumentation called twice");
8480 if (!VAStartInstrumentationList.empty()) {
8481 // If there is a va_start in this function, make a backup copy of
8482 // va_arg_tls somewhere in the function entry block.
8483 IRBuilder<> IRB(MSV.FnPrologueEnd);
8484 VAArgOverflowSize =
8485 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8486 Value *CopySize = IRB.CreateAdd(
8487 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
8488 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8489 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8490 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8491 CopySize, kShadowTLSAlignment, false);
8492
8493 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8494 Intrinsic::umin, CopySize,
8495 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8496 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8497 kShadowTLSAlignment, SrcSize);
8498 }
8499
8500 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
8501 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
8502
8503 // Instrument va_start, copy va_list shadow from the backup copy of
8504 // the TLS contents.
8505 for (CallInst *OrigInst : VAStartInstrumentationList) {
8506 NextNodeIRBuilder IRB(OrigInst);
8507
8508 Value *VAListTag = OrigInst->getArgOperand(0);
8509
8510 // The variadic ABI for AArch64 creates two areas to save the incoming
8511 // argument registers (one for 64-bit general register xn-x7 and another
8512 // for 128-bit FP/SIMD vn-v7).
8513 // We need then to propagate the shadow arguments on both regions
8514 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8515 // The remaining arguments are saved on shadow for 'va::stack'.
8516 // One caveat is it requires only to propagate the non-named arguments,
8517 // however on the call site instrumentation 'all' the arguments are
8518 // saved. So to copy the shadow values from the va_arg TLS array
8519 // we need to adjust the offset for both GR and VR fields based on
8520 // the __{gr,vr}_offs value (since they are stores based on incoming
8521 // named arguments).
8522 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8523
8524 // Read the stack pointer from the va_list.
8525 Value *StackSaveAreaPtr =
8526 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
8527
8528 // Read both the __gr_top and __gr_off and add them up.
8529 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
8530 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
8531
8532 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8533 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
8534
8535 // Read both the __vr_top and __vr_off and add them up.
8536 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
8537 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
8538
8539 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8540 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
8541
8542 // It does not know how many named arguments is being used and, on the
8543 // callsite all the arguments were saved. Since __gr_off is defined as
8544 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8545 // argument by ignoring the bytes of shadow from named arguments.
8546 Value *GrRegSaveAreaShadowPtrOff =
8547 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
8548
8549 Value *GrRegSaveAreaShadowPtr =
8550 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8551 Align(8), /*isStore*/ true)
8552 .first;
8553
8554 Value *GrSrcPtr =
8555 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
8556 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
8557
8558 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
8559 GrCopySize);
8560
8561 // Again, but for FP/SIMD values.
8562 Value *VrRegSaveAreaShadowPtrOff =
8563 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
8564
8565 Value *VrRegSaveAreaShadowPtr =
8566 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8567 Align(8), /*isStore*/ true)
8568 .first;
8569
8570 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8571 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
8572 IRB.getInt32(AArch64VrBegOffset)),
8573 VrRegSaveAreaShadowPtrOff);
8574 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
8575
8576 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
8577 VrCopySize);
8578
8579 // And finally for remaining arguments.
8580 Value *StackSaveAreaShadowPtr =
8581 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
8582 Align(16), /*isStore*/ true)
8583 .first;
8584
8585 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8586 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
8587
8588 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
8589 Align(16), VAArgOverflowSize);
8590 }
8591 }
8592};
8593
8594/// PowerPC64-specific implementation of VarArgHelper.
8595struct VarArgPowerPC64Helper : public VarArgHelperBase {
8596 AllocaInst *VAArgTLSCopy = nullptr;
8597 Value *VAArgSize = nullptr;
8598
8599 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8600 MemorySanitizerVisitor &MSV)
8601 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8602
8603 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8604 // For PowerPC, we need to deal with alignment of stack arguments -
8605 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8606 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8607 // For that reason, we compute current offset from stack pointer (which is
8608 // always properly aligned), and offset for the first vararg, then subtract
8609 // them.
8610 unsigned VAArgBase;
8611 Triple TargetTriple(F.getParent()->getTargetTriple());
8612 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8613 // and 32 bytes for ABIv2. This is usually determined by target
8614 // endianness, but in theory could be overridden by function attribute.
8615 if (TargetTriple.isPPC64ELFv2ABI())
8616 VAArgBase = 32;
8617 else
8618 VAArgBase = 48;
8619 unsigned VAArgOffset = VAArgBase;
8620 const DataLayout &DL = F.getDataLayout();
8621 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8622 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8623 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8624 if (IsByVal) {
8625 assert(A->getType()->isPointerTy());
8626 Type *RealTy = CB.getParamByValType(ArgNo);
8627 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8628 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
8629 if (ArgAlign < 8)
8630 ArgAlign = Align(8);
8631 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8632 if (!IsFixed) {
8633 Value *Base =
8634 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8635 if (Base) {
8636 Value *AShadowPtr, *AOriginPtr;
8637 std::tie(AShadowPtr, AOriginPtr) =
8638 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8639 kShadowTLSAlignment, /*isStore*/ false);
8640
8641 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8642 kShadowTLSAlignment, ArgSize);
8643 }
8644 }
8645 VAArgOffset += alignTo(ArgSize, Align(8));
8646 } else {
8647 Value *Base;
8648 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8649 Align ArgAlign = Align(8);
8650 if (A->getType()->isArrayTy()) {
8651 // Arrays are aligned to element size, except for long double
8652 // arrays, which are aligned to 8 bytes.
8653 Type *ElementTy = A->getType()->getArrayElementType();
8654 if (!ElementTy->isPPC_FP128Ty())
8655 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8656 } else if (A->getType()->isVectorTy()) {
8657 // Vectors are naturally aligned.
8658 ArgAlign = Align(ArgSize);
8659 }
8660 if (ArgAlign < 8)
8661 ArgAlign = Align(8);
8662 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8663 if (DL.isBigEndian()) {
8664 // Adjusting the shadow for argument with size < 8 to match the
8665 // placement of bits in big endian system
8666 if (ArgSize < 8)
8667 VAArgOffset += (8 - ArgSize);
8668 }
8669 if (!IsFixed) {
8670 Base =
8671 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8672 if (Base)
8673 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8674 }
8675 VAArgOffset += ArgSize;
8676 VAArgOffset = alignTo(VAArgOffset, Align(8));
8677 }
8678 if (IsFixed)
8679 VAArgBase = VAArgOffset;
8680 }
8681
8682 Constant *TotalVAArgSize =
8683 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8684 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8685 // a new class member i.e. it is the total size of all VarArgs.
8686 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8687 }
8688
8689 void finalizeInstrumentation() override {
8690 assert(!VAArgSize && !VAArgTLSCopy &&
8691 "finalizeInstrumentation called twice");
8692 IRBuilder<> IRB(MSV.FnPrologueEnd);
8693 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8694 Value *CopySize = VAArgSize;
8695
8696 if (!VAStartInstrumentationList.empty()) {
8697 // If there is a va_start in this function, make a backup copy of
8698 // va_arg_tls somewhere in the function entry block.
8699
8700 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8701 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8702 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8703 CopySize, kShadowTLSAlignment, false);
8704
8705 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8706 Intrinsic::umin, CopySize,
8707 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8708 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8709 kShadowTLSAlignment, SrcSize);
8710 }
8711
8712 // Instrument va_start.
8713 // Copy va_list shadow from the backup copy of the TLS contents.
8714 for (CallInst *OrigInst : VAStartInstrumentationList) {
8715 NextNodeIRBuilder IRB(OrigInst);
8716 Value *VAListTag = OrigInst->getArgOperand(0);
8717 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8718
8719 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8720
8721 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8722 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8723 const DataLayout &DL = F.getDataLayout();
8724 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8725 const Align Alignment = Align(IntptrSize);
8726 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8727 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8728 Alignment, /*isStore*/ true);
8729 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8730 CopySize);
8731 }
8732 }
8733};
8734
8735/// PowerPC32-specific implementation of VarArgHelper.
8736struct VarArgPowerPC32Helper : public VarArgHelperBase {
8737 AllocaInst *VAArgTLSCopy = nullptr;
8738 Value *VAArgSize = nullptr;
8739
8740 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8741 MemorySanitizerVisitor &MSV)
8742 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8743
8744 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8745 unsigned VAArgBase;
8746 // Parameter save area is 8 bytes from frame pointer in PPC32
8747 VAArgBase = 8;
8748 unsigned VAArgOffset = VAArgBase;
8749 const DataLayout &DL = F.getDataLayout();
8750 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8751 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8752 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8753 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8754 if (IsByVal) {
8755 assert(A->getType()->isPointerTy());
8756 Type *RealTy = CB.getParamByValType(ArgNo);
8757 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8758 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8759 if (ArgAlign < IntptrSize)
8760 ArgAlign = Align(IntptrSize);
8761 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8762 if (!IsFixed) {
8763 Value *Base =
8764 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8765 if (Base) {
8766 Value *AShadowPtr, *AOriginPtr;
8767 std::tie(AShadowPtr, AOriginPtr) =
8768 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8769 kShadowTLSAlignment, /*isStore*/ false);
8770
8771 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8772 kShadowTLSAlignment, ArgSize);
8773 }
8774 }
8775 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8776 } else {
8777 Value *Base;
8778 Type *ArgTy = A->getType();
8779
8780 // On PPC 32 floating point variable arguments are stored in separate
8781 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8782 // them as they will be found when checking call arguments.
8783 if (!ArgTy->isFloatingPointTy()) {
8784 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8785 Align ArgAlign = Align(IntptrSize);
8786 if (ArgTy->isArrayTy()) {
8787 // Arrays are aligned to element size, except for long double
8788 // arrays, which are aligned to 8 bytes.
8789 Type *ElementTy = ArgTy->getArrayElementType();
8790 if (!ElementTy->isPPC_FP128Ty())
8791 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8792 } else if (ArgTy->isVectorTy()) {
8793 // Vectors are naturally aligned.
8794 ArgAlign = Align(ArgSize);
8795 }
8796 if (ArgAlign < IntptrSize)
8797 ArgAlign = Align(IntptrSize);
8798 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8799 if (DL.isBigEndian()) {
8800 // Adjusting the shadow for argument with size < IntptrSize to match
8801 // the placement of bits in big endian system
8802 if (ArgSize < IntptrSize)
8803 VAArgOffset += (IntptrSize - ArgSize);
8804 }
8805 if (!IsFixed) {
8806 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8807 ArgSize);
8808 if (Base)
8809 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8811 }
8812 VAArgOffset += ArgSize;
8813 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8814 }
8815 }
8816 }
8817
8818 Constant *TotalVAArgSize =
8819 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8820 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8821 // a new class member i.e. it is the total size of all VarArgs.
8822 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8823 }
8824
8825 void finalizeInstrumentation() override {
8826 assert(!VAArgSize && !VAArgTLSCopy &&
8827 "finalizeInstrumentation called twice");
8828 IRBuilder<> IRB(MSV.FnPrologueEnd);
8829 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8830 Value *CopySize = VAArgSize;
8831
8832 if (!VAStartInstrumentationList.empty()) {
8833 // If there is a va_start in this function, make a backup copy of
8834 // va_arg_tls somewhere in the function entry block.
8835
8836 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8837 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8838 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8839 CopySize, kShadowTLSAlignment, false);
8840
8841 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8842 Intrinsic::umin, CopySize,
8843 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8844 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8845 kShadowTLSAlignment, SrcSize);
8846 }
8847
8848 // Instrument va_start.
8849 // Copy va_list shadow from the backup copy of the TLS contents.
8850 for (CallInst *OrigInst : VAStartInstrumentationList) {
8851 NextNodeIRBuilder IRB(OrigInst);
8852 Value *VAListTag = OrigInst->getArgOperand(0);
8853 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8854 Value *RegSaveAreaSize = CopySize;
8855
8856 // In PPC32 va_list_tag is a struct
8857 RegSaveAreaPtrPtr =
8858 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8859
8860 // On PPC 32 reg_save_area can only hold 32 bytes of data
8861 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8862 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8863
8864 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8865 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8866
8867 const DataLayout &DL = F.getDataLayout();
8868 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8869 const Align Alignment = Align(IntptrSize);
8870
8871 { // Copy reg save area
8872 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8873 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8874 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8875 Alignment, /*isStore*/ true);
8876 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8877 Alignment, RegSaveAreaSize);
8878
8879 RegSaveAreaShadowPtr =
8880 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8881 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8882 ConstantInt::get(MS.IntptrTy, 32));
8883 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8884 // We fill fp shadow with zeroes as uninitialized fp args should have
8885 // been found during call base check
8886 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8887 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8888 }
8889
8890 { // Copy overflow area
8891 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8892 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8893
8894 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8895 OverflowAreaPtrPtr =
8896 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8897 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8898
8899 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8900
8901 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8902 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8903 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8904 Alignment, /*isStore*/ true);
8905
8906 Value *OverflowVAArgTLSCopyPtr =
8907 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8908 OverflowVAArgTLSCopyPtr =
8909 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8910
8911 OverflowVAArgTLSCopyPtr =
8912 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8913 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8914 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8915 }
8916 }
8917 }
8918};
8919
8920/// SystemZ-specific implementation of VarArgHelper.
8921struct VarArgSystemZHelper : public VarArgHelperBase {
8922 static const unsigned SystemZGpOffset = 16;
8923 static const unsigned SystemZGpEndOffset = 56;
8924 static const unsigned SystemZFpOffset = 128;
8925 static const unsigned SystemZFpEndOffset = 160;
8926 static const unsigned SystemZMaxVrArgs = 8;
8927 static const unsigned SystemZRegSaveAreaSize = 160;
8928 static const unsigned SystemZOverflowOffset = 160;
8929 static const unsigned SystemZVAListTagSize = 32;
8930 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8931 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8932
8933 bool IsSoftFloatABI;
8934 AllocaInst *VAArgTLSCopy = nullptr;
8935 AllocaInst *VAArgTLSOriginCopy = nullptr;
8936 Value *VAArgOverflowSize = nullptr;
8937
8938 enum class ArgKind {
8939 GeneralPurpose,
8940 FloatingPoint,
8941 Vector,
8942 Memory,
8943 Indirect,
8944 };
8945
8946 enum class ShadowExtension { None, Zero, Sign };
8947
8948 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8949 MemorySanitizerVisitor &MSV)
8950 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8951 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8952
8953 ArgKind classifyArgument(Type *T) {
8954 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8955 // only a few possibilities of what it can be. In particular, enums, single
8956 // element structs and large types have already been taken care of.
8957
8958 // Some i128 and fp128 arguments are converted to pointers only in the
8959 // back end.
8960 if (T->isIntegerTy(128) || T->isFP128Ty())
8961 return ArgKind::Indirect;
8962 if (T->isFloatingPointTy())
8963 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8964 if (T->isIntegerTy() || T->isPointerTy())
8965 return ArgKind::GeneralPurpose;
8966 if (T->isVectorTy())
8967 return ArgKind::Vector;
8968 return ArgKind::Memory;
8969 }
8970
8971 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8972 // ABI says: "One of the simple integer types no more than 64 bits wide.
8973 // ... If such an argument is shorter than 64 bits, replace it by a full
8974 // 64-bit integer representing the same number, using sign or zero
8975 // extension". Shadow for an integer argument has the same type as the
8976 // argument itself, so it can be sign or zero extended as well.
8977 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8978 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8979 if (ZExt) {
8980 assert(!SExt);
8981 return ShadowExtension::Zero;
8982 }
8983 if (SExt) {
8984 assert(!ZExt);
8985 return ShadowExtension::Sign;
8986 }
8987 return ShadowExtension::None;
8988 }
8989
8990 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8991 unsigned GpOffset = SystemZGpOffset;
8992 unsigned FpOffset = SystemZFpOffset;
8993 unsigned VrIndex = 0;
8994 unsigned OverflowOffset = SystemZOverflowOffset;
8995 const DataLayout &DL = F.getDataLayout();
8996 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8997 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8998 // SystemZABIInfo does not produce ByVal parameters.
8999 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
9000 Type *T = A->getType();
9001 ArgKind AK = classifyArgument(T);
9002 if (AK == ArgKind::Indirect) {
9003 T = MS.PtrTy;
9004 AK = ArgKind::GeneralPurpose;
9005 }
9006 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
9007 AK = ArgKind::Memory;
9008 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
9009 AK = ArgKind::Memory;
9010 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
9011 AK = ArgKind::Memory;
9012 Value *ShadowBase = nullptr;
9013 Value *OriginBase = nullptr;
9014 ShadowExtension SE = ShadowExtension::None;
9015 switch (AK) {
9016 case ArgKind::GeneralPurpose: {
9017 // Always keep track of GpOffset, but store shadow only for varargs.
9018 uint64_t ArgSize = 8;
9019 if (GpOffset + ArgSize <= kParamTLSSize) {
9020 if (!IsFixed) {
9021 SE = getShadowExtension(CB, ArgNo);
9022 uint64_t GapSize = 0;
9023 if (SE == ShadowExtension::None) {
9024 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
9025 assert(ArgAllocSize <= ArgSize);
9026 GapSize = ArgSize - ArgAllocSize;
9027 }
9028 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
9029 if (MS.TrackOrigins)
9030 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
9031 }
9032 GpOffset += ArgSize;
9033 } else {
9034 GpOffset = kParamTLSSize;
9035 }
9036 break;
9037 }
9038 case ArgKind::FloatingPoint: {
9039 // Always keep track of FpOffset, but store shadow only for varargs.
9040 uint64_t ArgSize = 8;
9041 if (FpOffset + ArgSize <= kParamTLSSize) {
9042 if (!IsFixed) {
9043 // PoP says: "A short floating-point datum requires only the
9044 // left-most 32 bit positions of a floating-point register".
9045 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
9046 // don't extend shadow and don't mind the gap.
9047 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
9048 if (MS.TrackOrigins)
9049 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
9050 }
9051 FpOffset += ArgSize;
9052 } else {
9053 FpOffset = kParamTLSSize;
9054 }
9055 break;
9056 }
9057 case ArgKind::Vector: {
9058 // Keep track of VrIndex. No need to store shadow, since vector varargs
9059 // go through AK_Memory.
9060 assert(IsFixed);
9061 VrIndex++;
9062 break;
9063 }
9064 case ArgKind::Memory: {
9065 // Keep track of OverflowOffset and store shadow only for varargs.
9066 // Ignore fixed args, since we need to copy only the vararg portion of
9067 // the overflow area shadow.
9068 if (!IsFixed) {
9069 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
9070 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
9071 if (OverflowOffset + ArgSize <= kParamTLSSize) {
9072 SE = getShadowExtension(CB, ArgNo);
9073 uint64_t GapSize =
9074 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
9075 ShadowBase =
9076 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
9077 if (MS.TrackOrigins)
9078 OriginBase =
9079 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
9080 OverflowOffset += ArgSize;
9081 } else {
9082 OverflowOffset = kParamTLSSize;
9083 }
9084 }
9085 break;
9086 }
9087 case ArgKind::Indirect:
9088 llvm_unreachable("Indirect must be converted to GeneralPurpose");
9089 }
9090 if (ShadowBase == nullptr)
9091 continue;
9092 Value *Shadow = MSV.getShadow(A);
9093 if (SE != ShadowExtension::None)
9094 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
9095 /*Signed*/ SE == ShadowExtension::Sign);
9096 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
9097 IRB.CreateStore(Shadow, ShadowBase);
9098 if (MS.TrackOrigins) {
9099 Value *Origin = MSV.getOrigin(A);
9100 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
9101 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
9103 }
9104 }
9105 Constant *OverflowSize = ConstantInt::get(
9106 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
9107 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
9108 }
9109
9110 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
9111 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
9112 IRB.CreateAdd(
9113 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9114 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
9115 MS.PtrTy);
9116 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
9117 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9118 const Align Alignment = Align(8);
9119 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9120 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
9121 /*isStore*/ true);
9122 // TODO(iii): copy only fragments filled by visitCallBase()
9123 // TODO(iii): support packed-stack && !use-soft-float
9124 // For use-soft-float functions, it is enough to copy just the GPRs.
9125 unsigned RegSaveAreaSize =
9126 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
9127 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9128 RegSaveAreaSize);
9129 if (MS.TrackOrigins)
9130 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
9131 Alignment, RegSaveAreaSize);
9132 }
9133
9134 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
9135 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
9136 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
9137 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
9138 IRB.CreateAdd(
9139 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9140 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
9141 MS.PtrTy);
9142 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
9143 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
9144 const Align Alignment = Align(8);
9145 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
9146 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
9147 Alignment, /*isStore*/ true);
9148 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
9149 SystemZOverflowOffset);
9150 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
9151 VAArgOverflowSize);
9152 if (MS.TrackOrigins) {
9153 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
9154 SystemZOverflowOffset);
9155 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
9156 VAArgOverflowSize);
9157 }
9158 }
9159
9160 void finalizeInstrumentation() override {
9161 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
9162 "finalizeInstrumentation called twice");
9163 if (!VAStartInstrumentationList.empty()) {
9164 // If there is a va_start in this function, make a backup copy of
9165 // va_arg_tls somewhere in the function entry block.
9166 IRBuilder<> IRB(MSV.FnPrologueEnd);
9167 VAArgOverflowSize =
9168 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
9169 Value *CopySize =
9170 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
9171 VAArgOverflowSize);
9172 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9173 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9174 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9175 CopySize, kShadowTLSAlignment, false);
9176
9177 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9178 Intrinsic::umin, CopySize,
9179 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9180 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9181 kShadowTLSAlignment, SrcSize);
9182 if (MS.TrackOrigins) {
9183 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9184 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
9185 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
9186 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
9187 }
9188 }
9189
9190 // Instrument va_start.
9191 // Copy va_list shadow from the backup copy of the TLS contents.
9192 for (CallInst *OrigInst : VAStartInstrumentationList) {
9193 NextNodeIRBuilder IRB(OrigInst);
9194 Value *VAListTag = OrigInst->getArgOperand(0);
9195 copyRegSaveArea(IRB, VAListTag);
9196 copyOverflowArea(IRB, VAListTag);
9197 }
9198 }
9199};
9200
9201/// i386-specific implementation of VarArgHelper.
9202struct VarArgI386Helper : public VarArgHelperBase {
9203 AllocaInst *VAArgTLSCopy = nullptr;
9204 Value *VAArgSize = nullptr;
9205
9206 VarArgI386Helper(Function &F, MemorySanitizer &MS,
9207 MemorySanitizerVisitor &MSV)
9208 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
9209
9210 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9211 const DataLayout &DL = F.getDataLayout();
9212 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9213 unsigned VAArgOffset = 0;
9214 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
9215 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9216 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
9217 if (IsByVal) {
9218 assert(A->getType()->isPointerTy());
9219 Type *RealTy = CB.getParamByValType(ArgNo);
9220 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
9221 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
9222 if (ArgAlign < IntptrSize)
9223 ArgAlign = Align(IntptrSize);
9224 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
9225 if (!IsFixed) {
9226 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9227 if (Base) {
9228 Value *AShadowPtr, *AOriginPtr;
9229 std::tie(AShadowPtr, AOriginPtr) =
9230 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
9231 kShadowTLSAlignment, /*isStore*/ false);
9232
9233 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
9234 kShadowTLSAlignment, ArgSize);
9235 }
9236 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
9237 }
9238 } else {
9239 Value *Base;
9240 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
9241 Align ArgAlign = Align(IntptrSize);
9242 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
9243 if (DL.isBigEndian()) {
9244 // Adjusting the shadow for argument with size < IntptrSize to match
9245 // the placement of bits in big endian system
9246 if (ArgSize < IntptrSize)
9247 VAArgOffset += (IntptrSize - ArgSize);
9248 }
9249 if (!IsFixed) {
9250 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9251 if (Base)
9252 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
9253 VAArgOffset += ArgSize;
9254 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
9255 }
9256 }
9257 }
9258
9259 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
9260 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9261 // a new class member i.e. it is the total size of all VarArgs.
9262 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
9263 }
9264
9265 void finalizeInstrumentation() override {
9266 assert(!VAArgSize && !VAArgTLSCopy &&
9267 "finalizeInstrumentation called twice");
9268 IRBuilder<> IRB(MSV.FnPrologueEnd);
9269 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
9270 Value *CopySize = VAArgSize;
9271
9272 if (!VAStartInstrumentationList.empty()) {
9273 // If there is a va_start in this function, make a backup copy of
9274 // va_arg_tls somewhere in the function entry block.
9275 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9276 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9277 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9278 CopySize, kShadowTLSAlignment, false);
9279
9280 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9281 Intrinsic::umin, CopySize,
9282 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9283 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9284 kShadowTLSAlignment, SrcSize);
9285 }
9286
9287 // Instrument va_start.
9288 // Copy va_list shadow from the backup copy of the TLS contents.
9289 for (CallInst *OrigInst : VAStartInstrumentationList) {
9290 NextNodeIRBuilder IRB(OrigInst);
9291 Value *VAListTag = OrigInst->getArgOperand(0);
9292 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
9293 Value *RegSaveAreaPtrPtr =
9294 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9295 PointerType::get(*MS.C, 0));
9296 Value *RegSaveAreaPtr =
9297 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
9298 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9299 const DataLayout &DL = F.getDataLayout();
9300 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9301 const Align Alignment = Align(IntptrSize);
9302 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9303 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
9304 Alignment, /*isStore*/ true);
9305 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9306 CopySize);
9307 }
9308 }
9309};
9310
9311/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
9312/// LoongArch64.
9313struct VarArgGenericHelper : public VarArgHelperBase {
9314 AllocaInst *VAArgTLSCopy = nullptr;
9315 Value *VAArgSize = nullptr;
9316
9317 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
9318 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
9319 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
9320
9321 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9322 unsigned VAArgOffset = 0;
9323 const DataLayout &DL = F.getDataLayout();
9324 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9325 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
9326 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9327 if (IsFixed)
9328 continue;
9329 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
9330 if (DL.isBigEndian()) {
9331 // Adjusting the shadow for argument with size < IntptrSize to match the
9332 // placement of bits in big endian system
9333 if (ArgSize < IntptrSize)
9334 VAArgOffset += (IntptrSize - ArgSize);
9335 }
9336 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9337 VAArgOffset += ArgSize;
9338 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
9339 if (!Base)
9340 continue;
9341 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
9342 }
9343
9344 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
9345 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9346 // a new class member i.e. it is the total size of all VarArgs.
9347 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
9348 }
9349
9350 void finalizeInstrumentation() override {
9351 assert(!VAArgSize && !VAArgTLSCopy &&
9352 "finalizeInstrumentation called twice");
9353 IRBuilder<> IRB(MSV.FnPrologueEnd);
9354 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
9355 Value *CopySize = VAArgSize;
9356
9357 if (!VAStartInstrumentationList.empty()) {
9358 // If there is a va_start in this function, make a backup copy of
9359 // va_arg_tls somewhere in the function entry block.
9360 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9361 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9362 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9363 CopySize, kShadowTLSAlignment, false);
9364
9365 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9366 Intrinsic::umin, CopySize,
9367 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9368 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9369 kShadowTLSAlignment, SrcSize);
9370 }
9371
9372 // Instrument va_start.
9373 // Copy va_list shadow from the backup copy of the TLS contents.
9374 for (CallInst *OrigInst : VAStartInstrumentationList) {
9375 NextNodeIRBuilder IRB(OrigInst);
9376 Value *VAListTag = OrigInst->getArgOperand(0);
9377 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
9378 Value *RegSaveAreaPtrPtr =
9379 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9380 PointerType::get(*MS.C, 0));
9381 Value *RegSaveAreaPtr =
9382 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
9383 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9384 const DataLayout &DL = F.getDataLayout();
9385 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9386 const Align Alignment = Align(IntptrSize);
9387 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9388 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
9389 Alignment, /*isStore*/ true);
9390 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9391 CopySize);
9392 }
9393 }
9394};
9395
9396// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
9397// regarding VAArgs.
9398using VarArgARM32Helper = VarArgGenericHelper;
9399using VarArgRISCVHelper = VarArgGenericHelper;
9400using VarArgMIPSHelper = VarArgGenericHelper;
9401using VarArgLoongArch64Helper = VarArgGenericHelper;
9402using VarArgHexagonHelper = VarArgGenericHelper;
9403
9404/// A no-op implementation of VarArgHelper.
9405struct VarArgNoOpHelper : public VarArgHelper {
9406 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
9407 MemorySanitizerVisitor &MSV) {}
9408
9409 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
9410
9411 void visitVAStartInst(VAStartInst &I) override {}
9412
9413 void visitVACopyInst(VACopyInst &I) override {}
9414
9415 void finalizeInstrumentation() override {}
9416};
9417
9418} // end anonymous namespace
9419
9420static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
9421 MemorySanitizerVisitor &Visitor) {
9422 // VarArg handling is only implemented on AMD64. False positives are possible
9423 // on other platforms.
9424 Triple TargetTriple(Func.getParent()->getTargetTriple());
9425
9426 if (TargetTriple.getArch() == Triple::x86)
9427 return new VarArgI386Helper(Func, Msan, Visitor);
9428
9429 if (TargetTriple.getArch() == Triple::x86_64)
9430 return new VarArgAMD64Helper(Func, Msan, Visitor);
9431
9432 if (TargetTriple.isARM())
9433 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9434
9435 if (TargetTriple.isAArch64())
9436 return new VarArgAArch64Helper(Func, Msan, Visitor);
9437
9438 if (TargetTriple.isSystemZ())
9439 return new VarArgSystemZHelper(Func, Msan, Visitor);
9440
9441 // On PowerPC32 VAListTag is a struct
9442 // {char, char, i16 padding, char *, char *}
9443 if (TargetTriple.isPPC32())
9444 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
9445
9446 if (TargetTriple.isPPC64())
9447 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
9448
9449 if (TargetTriple.isRISCV32())
9450 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9451
9452 if (TargetTriple.isRISCV64())
9453 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9454
9455 if (TargetTriple.isMIPS32())
9456 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9457
9458 if (TargetTriple.isMIPS64())
9459 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9460
9461 if (TargetTriple.isLoongArch64())
9462 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
9463 /*VAListTagSize=*/8);
9464
9465 if (TargetTriple.getArch() == Triple::hexagon)
9466 return new VarArgHexagonHelper(Func, Msan, Visitor, /*VAListTagSize=*/12);
9467
9468 return new VarArgNoOpHelper(Func, Msan, Visitor);
9469}
9470
9471bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
9472 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
9473 return false;
9474
9475 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
9476 return false;
9477
9478 MemorySanitizerVisitor Visitor(F, *this, TLI);
9479
9480 // Clear out memory attributes.
9482 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
9483 F.removeFnAttrs(B);
9484
9485 return Visitor.runOnFunction();
9486}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
#define X(NUM, ENUM, NAME)
Definition ELF.h:853
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_S390X_MemoryMapParams
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams Linux_Hexagon_MemoryMapParams_P
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
OddOrEvenLanes
@ kOddLanes
@ kEvenLanes
@ kBothLanes
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< int > ClSwitchPrecision("msan-switch-precision", cl::desc("Controls the number of cases considered by MSan for LLVM switch " "instructions. 0 means no UUMs detected. Higher values lead to " "fewer false negatives but may impact compiler and/or " "application performance. N.B. LLVM switch instructions do not " "correspond exactly to C++ switch statements."), cl::Hidden, cl::init(99))
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static const MemoryMapParams Linux_Hexagon_MemoryMapParams
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
uint64_t IntrinsicInst * II
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static void visit(BasicBlock &Start, std::function< bool(BasicBlock *)> op)
static const char * name
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:119
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
Get the first element.
Definition ArrayRef.h:144
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:474
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true, bool ByteString=false)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=false)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:135
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:84
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:873
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:223
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:504
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2627
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:2023
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1891
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:571
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2681
LLVM_ABI Value * CreateAllocationSize(Type *DestTy, AllocaInst *AI)
Get allocation size of an alloca as a runtime Value* (handles both static and dynamic allocas and vsc...
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2615
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:599
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1935
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:717
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2289
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2674
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > OverloadTypes, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="", ArrayRef< OperandBundleDef > OpBundles={})
Create a call to intrinsic ID with Args, mangled using OverloadTypes.
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2132
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2237
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1554
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:586
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:519
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2091
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:591
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1495
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2378
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2010
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1842
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:529
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2539
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1866
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2374
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:65
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1461
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2242
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended from a 64-bit value.
Definition IRBuilder.h:539
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1918
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1533
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:660
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2120
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2649
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1592
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1931
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1444
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2232
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2707
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2553
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2106
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:629
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1753
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2406
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2386
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2315
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2702
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:624
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1954
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2096
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1573
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1644
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2484
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1614
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:576
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1478
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
Value * CreateFCmpULT(Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2469
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2858
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:354
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
Represent a constant reference to a string, i.e.
Definition StringRef.h:56
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:483
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1043
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1084
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1057
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:436
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1089
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1032
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1038
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:918
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1062
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1009
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1108
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:46
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:290
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:281
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:65
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:263
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:284
Type * getArrayElementType() const
Definition Type.h:427
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:167
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:286
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:370
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:201
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:328
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:236
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:186
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:272
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:257
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:227
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:141
Value * getOperand(unsigned i) const
Definition User.h:207
unsigned getNumOperands() const
Definition User.h:229
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:255
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:393
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:318
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
LLVM_ABI StringRef getBaseName(ID id)
Return the LLVM name for an intrinsic, without encoded types for overloading, such as "llvm....
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:557
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1668
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2553
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3889
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2901
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:89