LLVM 22.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-handle-lifetime-intrinsics",
304 cl::desc(
305 "when possible, poison scoped variables at the beginning of the scope "
306 "(slower, but more precise)"),
307 cl::Hidden, cl::init(true));
308
309// When compiling the Linux kernel, we sometimes see false positives related to
310// MSan being unable to understand that inline assembly calls may initialize
311// local variables.
312// This flag makes the compiler conservatively unpoison every memory location
313// passed into an assembly call. Note that this may cause false positives.
314// Because it's impossible to figure out the array sizes, we can only unpoison
315// the first sizeof(type) bytes for each type* pointer.
317 "msan-handle-asm-conservative",
318 cl::desc("conservative handling of inline assembly"), cl::Hidden,
319 cl::init(true));
320
321// This flag controls whether we check the shadow of the address
322// operand of load or store. Such bugs are very rare, since load from
323// a garbage address typically results in SEGV, but still happen
324// (e.g. only lower bits of address are garbage, or the access happens
325// early at program startup where malloc-ed memory is more likely to
326// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
328 "msan-check-access-address",
329 cl::desc("report accesses through a pointer which has poisoned shadow"),
330 cl::Hidden, cl::init(true));
331
333 "msan-eager-checks",
334 cl::desc("check arguments and return values at function call boundaries"),
335 cl::Hidden, cl::init(false));
336
338 "msan-dump-strict-instructions",
339 cl::desc("print out instructions with default strict semantics i.e.,"
340 "check that all the inputs are fully initialized, and mark "
341 "the output as fully initialized. These semantics are applied "
342 "to instructions that could not be handled explicitly nor "
343 "heuristically."),
344 cl::Hidden, cl::init(false));
345
346// Currently, all the heuristically handled instructions are specifically
347// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
348// to parallel 'msan-dump-strict-instructions', and to keep the door open to
349// handling non-intrinsic instructions heuristically.
351 "msan-dump-heuristic-instructions",
352 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
353 "Use -msan-dump-strict-instructions to print instructions that "
354 "could not be handled explicitly nor heuristically."),
355 cl::Hidden, cl::init(false));
356
358 "msan-instrumentation-with-call-threshold",
359 cl::desc(
360 "If the function being instrumented requires more than "
361 "this number of checks and origin stores, use callbacks instead of "
362 "inline checks (-1 means never use callbacks)."),
363 cl::Hidden, cl::init(3500));
364
365static cl::opt<bool>
366 ClEnableKmsan("msan-kernel",
367 cl::desc("Enable KernelMemorySanitizer instrumentation"),
368 cl::Hidden, cl::init(false));
369
370static cl::opt<bool>
371 ClDisableChecks("msan-disable-checks",
372 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
373 cl::init(false));
374
375static cl::opt<bool>
376 ClCheckConstantShadow("msan-check-constant-shadow",
377 cl::desc("Insert checks for constant shadow values"),
378 cl::Hidden, cl::init(true));
379
380// This is off by default because of a bug in gold:
381// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
382static cl::opt<bool>
383 ClWithComdat("msan-with-comdat",
384 cl::desc("Place MSan constructors in comdat sections"),
385 cl::Hidden, cl::init(false));
386
387// These options allow to specify custom memory map parameters
388// See MemoryMapParams for details.
389static cl::opt<uint64_t> ClAndMask("msan-and-mask",
390 cl::desc("Define custom MSan AndMask"),
391 cl::Hidden, cl::init(0));
392
393static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
394 cl::desc("Define custom MSan XorMask"),
395 cl::Hidden, cl::init(0));
396
397static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
398 cl::desc("Define custom MSan ShadowBase"),
399 cl::Hidden, cl::init(0));
400
401static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
402 cl::desc("Define custom MSan OriginBase"),
403 cl::Hidden, cl::init(0));
404
405static cl::opt<int>
406 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
407 cl::desc("Define threshold for number of checks per "
408 "debug location to force origin update."),
409 cl::Hidden, cl::init(3));
410
411const char kMsanModuleCtorName[] = "msan.module_ctor";
412const char kMsanInitName[] = "__msan_init";
413
414namespace {
415
416// Memory map parameters used in application-to-shadow address calculation.
417// Offset = (Addr & ~AndMask) ^ XorMask
418// Shadow = ShadowBase + Offset
419// Origin = OriginBase + Offset
420struct MemoryMapParams {
421 uint64_t AndMask;
422 uint64_t XorMask;
423 uint64_t ShadowBase;
424 uint64_t OriginBase;
425};
426
427struct PlatformMemoryMapParams {
428 const MemoryMapParams *bits32;
429 const MemoryMapParams *bits64;
430};
431
432} // end anonymous namespace
433
434// i386 Linux
435static const MemoryMapParams Linux_I386_MemoryMapParams = {
436 0x000080000000, // AndMask
437 0, // XorMask (not used)
438 0, // ShadowBase (not used)
439 0x000040000000, // OriginBase
440};
441
442// x86_64 Linux
443static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
444 0, // AndMask (not used)
445 0x500000000000, // XorMask
446 0, // ShadowBase (not used)
447 0x100000000000, // OriginBase
448};
449
450// mips32 Linux
451// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
452// after picking good constants
453
454// mips64 Linux
455static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
456 0, // AndMask (not used)
457 0x008000000000, // XorMask
458 0, // ShadowBase (not used)
459 0x002000000000, // OriginBase
460};
461
462// ppc32 Linux
463// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
464// after picking good constants
465
466// ppc64 Linux
467static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
468 0xE00000000000, // AndMask
469 0x100000000000, // XorMask
470 0x080000000000, // ShadowBase
471 0x1C0000000000, // OriginBase
472};
473
474// s390x Linux
475static const MemoryMapParams Linux_S390X_MemoryMapParams = {
476 0xC00000000000, // AndMask
477 0, // XorMask (not used)
478 0x080000000000, // ShadowBase
479 0x1C0000000000, // OriginBase
480};
481
482// arm32 Linux
483// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
484// after picking good constants
485
486// aarch64 Linux
487static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
488 0, // AndMask (not used)
489 0x0B00000000000, // XorMask
490 0, // ShadowBase (not used)
491 0x0200000000000, // OriginBase
492};
493
494// loongarch64 Linux
495static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
496 0, // AndMask (not used)
497 0x500000000000, // XorMask
498 0, // ShadowBase (not used)
499 0x100000000000, // OriginBase
500};
501
502// riscv32 Linux
503// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
504// after picking good constants
505
506// aarch64 FreeBSD
507static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
508 0x1800000000000, // AndMask
509 0x0400000000000, // XorMask
510 0x0200000000000, // ShadowBase
511 0x0700000000000, // OriginBase
512};
513
514// i386 FreeBSD
515static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
516 0x000180000000, // AndMask
517 0x000040000000, // XorMask
518 0x000020000000, // ShadowBase
519 0x000700000000, // OriginBase
520};
521
522// x86_64 FreeBSD
523static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
524 0xc00000000000, // AndMask
525 0x200000000000, // XorMask
526 0x100000000000, // ShadowBase
527 0x380000000000, // OriginBase
528};
529
530// x86_64 NetBSD
531static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
532 0, // AndMask
533 0x500000000000, // XorMask
534 0, // ShadowBase
535 0x100000000000, // OriginBase
536};
537
538static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
541};
542
543static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
544 nullptr,
546};
547
548static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
549 nullptr,
551};
552
553static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
554 nullptr,
556};
557
558static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
559 nullptr,
561};
562
563static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
564 nullptr,
566};
567
568static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
569 nullptr,
571};
572
573static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
576};
577
578static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
579 nullptr,
581};
582
583namespace {
584
585/// Instrument functions of a module to detect uninitialized reads.
586///
587/// Instantiating MemorySanitizer inserts the msan runtime library API function
588/// declarations into the module if they don't exist already. Instantiating
589/// ensures the __msan_init function is in the list of global constructors for
590/// the module.
591class MemorySanitizer {
592public:
593 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
594 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
595 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
596 initializeModule(M);
597 }
598
599 // MSan cannot be moved or copied because of MapParams.
600 MemorySanitizer(MemorySanitizer &&) = delete;
601 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
602 MemorySanitizer(const MemorySanitizer &) = delete;
603 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
604
605 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
606
607private:
608 friend struct MemorySanitizerVisitor;
609 friend struct VarArgHelperBase;
610 friend struct VarArgAMD64Helper;
611 friend struct VarArgAArch64Helper;
612 friend struct VarArgPowerPC64Helper;
613 friend struct VarArgPowerPC32Helper;
614 friend struct VarArgSystemZHelper;
615 friend struct VarArgI386Helper;
616 friend struct VarArgGenericHelper;
617
618 void initializeModule(Module &M);
619 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
620 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
621 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
622
623 template <typename... ArgsTy>
624 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
625 ArgsTy... Args);
626
627 /// True if we're compiling the Linux kernel.
628 bool CompileKernel;
629 /// Track origins (allocation points) of uninitialized values.
630 int TrackOrigins;
631 bool Recover;
632 bool EagerChecks;
633
634 Triple TargetTriple;
635 LLVMContext *C;
636 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
637 Type *OriginTy;
638 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
639
640 // XxxTLS variables represent the per-thread state in MSan and per-task state
641 // in KMSAN.
642 // For the userspace these point to thread-local globals. In the kernel land
643 // they point to the members of a per-task struct obtained via a call to
644 // __msan_get_context_state().
645
646 /// Thread-local shadow storage for function parameters.
647 Value *ParamTLS;
648
649 /// Thread-local origin storage for function parameters.
650 Value *ParamOriginTLS;
651
652 /// Thread-local shadow storage for function return value.
653 Value *RetvalTLS;
654
655 /// Thread-local origin storage for function return value.
656 Value *RetvalOriginTLS;
657
658 /// Thread-local shadow storage for in-register va_arg function.
659 Value *VAArgTLS;
660
661 /// Thread-local shadow storage for in-register va_arg function.
662 Value *VAArgOriginTLS;
663
664 /// Thread-local shadow storage for va_arg overflow area.
665 Value *VAArgOverflowSizeTLS;
666
667 /// Are the instrumentation callbacks set up?
668 bool CallbacksInitialized = false;
669
670 /// The run-time callback to print a warning.
671 FunctionCallee WarningFn;
672
673 // These arrays are indexed by log2(AccessSize).
674 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
675 FunctionCallee MaybeWarningVarSizeFn;
676 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
677
678 /// Run-time helper that generates a new origin value for a stack
679 /// allocation.
680 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
681 // No description version
682 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
683
684 /// Run-time helper that poisons stack on function entry.
685 FunctionCallee MsanPoisonStackFn;
686
687 /// Run-time helper that records a store (or any event) of an
688 /// uninitialized value and returns an updated origin id encoding this info.
689 FunctionCallee MsanChainOriginFn;
690
691 /// Run-time helper that paints an origin over a region.
692 FunctionCallee MsanSetOriginFn;
693
694 /// MSan runtime replacements for memmove, memcpy and memset.
695 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
696
697 /// KMSAN callback for task-local function argument shadow.
698 StructType *MsanContextStateTy;
699 FunctionCallee MsanGetContextStateFn;
700
701 /// Functions for poisoning/unpoisoning local variables
702 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
703
704 /// Pair of shadow/origin pointers.
705 Type *MsanMetadata;
706
707 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
708 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
709 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
710 FunctionCallee MsanMetadataPtrForStore_1_8[4];
711 FunctionCallee MsanInstrumentAsmStoreFn;
712
713 /// Storage for return values of the MsanMetadataPtrXxx functions.
714 Value *MsanMetadataAlloca;
715
716 /// Helper to choose between different MsanMetadataPtrXxx().
717 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
718
719 /// Memory map parameters used in application-to-shadow calculation.
720 const MemoryMapParams *MapParams;
721
722 /// Custom memory map parameters used when -msan-shadow-base or
723 // -msan-origin-base is provided.
724 MemoryMapParams CustomMapParams;
725
726 MDNode *ColdCallWeights;
727
728 /// Branch weights for origin store.
729 MDNode *OriginStoreWeights;
730};
731
732void insertModuleCtor(Module &M) {
735 /*InitArgTypes=*/{},
736 /*InitArgs=*/{},
737 // This callback is invoked when the functions are created the first
738 // time. Hook them into the global ctors list in that case:
739 [&](Function *Ctor, FunctionCallee) {
740 if (!ClWithComdat) {
741 appendToGlobalCtors(M, Ctor, 0);
742 return;
743 }
744 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
745 Ctor->setComdat(MsanCtorComdat);
746 appendToGlobalCtors(M, Ctor, 0, Ctor);
747 });
748}
749
750template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
751 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
752}
753
754} // end anonymous namespace
755
757 bool EagerChecks)
758 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
759 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
760 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
761 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
762
765 // Return early if nosanitize_memory module flag is present for the module.
766 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
767 return PreservedAnalyses::all();
768 bool Modified = false;
769 if (!Options.Kernel) {
770 insertModuleCtor(M);
771 Modified = true;
772 }
773
774 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
775 for (Function &F : M) {
776 if (F.empty())
777 continue;
778 MemorySanitizer Msan(*F.getParent(), Options);
779 Modified |=
780 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
781 }
782
783 if (!Modified)
784 return PreservedAnalyses::all();
785
787 // GlobalsAA is considered stateless and does not get invalidated unless
788 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
789 // make changes that require GlobalsAA to be invalidated.
790 PA.abandon<GlobalsAA>();
791 return PA;
792}
793
795 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
797 OS, MapClassName2PassName);
798 OS << '<';
799 if (Options.Recover)
800 OS << "recover;";
801 if (Options.Kernel)
802 OS << "kernel;";
803 if (Options.EagerChecks)
804 OS << "eager-checks;";
805 OS << "track-origins=" << Options.TrackOrigins;
806 OS << '>';
807}
808
809/// Create a non-const global initialized with the given string.
810///
811/// Creates a writable global for Str so that we can pass it to the
812/// run-time lib. Runtime uses first 4 bytes of the string to store the
813/// frame ID, so the string needs to be mutable.
815 StringRef Str) {
816 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
817 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
818 GlobalValue::PrivateLinkage, StrConst, "");
819}
820
821template <typename... ArgsTy>
823MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
824 ArgsTy... Args) {
825 if (TargetTriple.getArch() == Triple::systemz) {
826 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
827 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
828 std::forward<ArgsTy>(Args)...);
829 }
830
831 return M.getOrInsertFunction(Name, MsanMetadata,
832 std::forward<ArgsTy>(Args)...);
833}
834
835/// Create KMSAN API callbacks.
836void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
837 IRBuilder<> IRB(*C);
838
839 // These will be initialized in insertKmsanPrologue().
840 RetvalTLS = nullptr;
841 RetvalOriginTLS = nullptr;
842 ParamTLS = nullptr;
843 ParamOriginTLS = nullptr;
844 VAArgTLS = nullptr;
845 VAArgOriginTLS = nullptr;
846 VAArgOverflowSizeTLS = nullptr;
847
848 WarningFn = M.getOrInsertFunction("__msan_warning",
849 TLI.getAttrList(C, {0}, /*Signed=*/false),
850 IRB.getVoidTy(), IRB.getInt32Ty());
851
852 // Requests the per-task context state (kmsan_context_state*) from the
853 // runtime library.
854 MsanContextStateTy = StructType::get(
855 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
858 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
859 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
860 OriginTy);
861 MsanGetContextStateFn =
862 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
863
864 MsanMetadata = StructType::get(PtrTy, PtrTy);
865
866 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
867 std::string name_load =
868 "__msan_metadata_ptr_for_load_" + std::to_string(size);
869 std::string name_store =
870 "__msan_metadata_ptr_for_store_" + std::to_string(size);
871 MsanMetadataPtrForLoad_1_8[ind] =
872 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
873 MsanMetadataPtrForStore_1_8[ind] =
874 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
875 }
876
877 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
878 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
879 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
880 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
881
882 // Functions for poisoning and unpoisoning memory.
883 MsanPoisonAllocaFn = M.getOrInsertFunction(
884 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
885 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
886 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
887}
888
890 return M.getOrInsertGlobal(Name, Ty, [&] {
891 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
892 nullptr, Name, nullptr,
894 });
895}
896
897/// Insert declarations for userspace-specific functions and globals.
898void MemorySanitizer::createUserspaceApi(Module &M,
899 const TargetLibraryInfo &TLI) {
900 IRBuilder<> IRB(*C);
901
902 // Create the callback.
903 // FIXME: this function should have "Cold" calling conv,
904 // which is not yet implemented.
905 if (TrackOrigins) {
906 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
907 : "__msan_warning_with_origin_noreturn";
908 WarningFn = M.getOrInsertFunction(WarningFnName,
909 TLI.getAttrList(C, {0}, /*Signed=*/false),
910 IRB.getVoidTy(), IRB.getInt32Ty());
911 } else {
912 StringRef WarningFnName =
913 Recover ? "__msan_warning" : "__msan_warning_noreturn";
914 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
915 }
916
917 // Create the global TLS variables.
918 RetvalTLS =
919 getOrInsertGlobal(M, "__msan_retval_tls",
920 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
921
922 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
923
924 ParamTLS =
925 getOrInsertGlobal(M, "__msan_param_tls",
926 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
927
928 ParamOriginTLS =
929 getOrInsertGlobal(M, "__msan_param_origin_tls",
930 ArrayType::get(OriginTy, kParamTLSSize / 4));
931
932 VAArgTLS =
933 getOrInsertGlobal(M, "__msan_va_arg_tls",
934 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
935
936 VAArgOriginTLS =
937 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
938 ArrayType::get(OriginTy, kParamTLSSize / 4));
939
940 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
941 IRB.getIntPtrTy(M.getDataLayout()));
942
943 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
944 AccessSizeIndex++) {
945 unsigned AccessSize = 1 << AccessSizeIndex;
946 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
947 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
948 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
949 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
950 MaybeWarningVarSizeFn = M.getOrInsertFunction(
951 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
952 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
953 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
954 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
955 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
956 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
957 IRB.getInt32Ty());
958 }
959
960 MsanSetAllocaOriginWithDescriptionFn =
961 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
962 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
963 MsanSetAllocaOriginNoDescriptionFn =
964 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
965 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
966 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
967 IRB.getVoidTy(), PtrTy, IntptrTy);
968}
969
970/// Insert extern declaration of runtime-provided functions and globals.
971void MemorySanitizer::initializeCallbacks(Module &M,
972 const TargetLibraryInfo &TLI) {
973 // Only do this once.
974 if (CallbacksInitialized)
975 return;
976
977 IRBuilder<> IRB(*C);
978 // Initialize callbacks that are common for kernel and userspace
979 // instrumentation.
980 MsanChainOriginFn = M.getOrInsertFunction(
981 "__msan_chain_origin",
982 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
983 IRB.getInt32Ty());
984 MsanSetOriginFn = M.getOrInsertFunction(
985 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
986 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
987 MemmoveFn =
988 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
989 MemcpyFn =
990 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
991 MemsetFn = M.getOrInsertFunction("__msan_memset",
992 TLI.getAttrList(C, {1}, /*Signed=*/true),
993 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
994
995 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
996 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
997
998 if (CompileKernel) {
999 createKernelApi(M, TLI);
1000 } else {
1001 createUserspaceApi(M, TLI);
1002 }
1003 CallbacksInitialized = true;
1004}
1005
1006FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1007 int size) {
1008 FunctionCallee *Fns =
1009 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1010 switch (size) {
1011 case 1:
1012 return Fns[0];
1013 case 2:
1014 return Fns[1];
1015 case 4:
1016 return Fns[2];
1017 case 8:
1018 return Fns[3];
1019 default:
1020 return nullptr;
1021 }
1022}
1023
1024/// Module-level initialization.
1025///
1026/// inserts a call to __msan_init to the module's constructor list.
1027void MemorySanitizer::initializeModule(Module &M) {
1028 auto &DL = M.getDataLayout();
1029
1030 TargetTriple = M.getTargetTriple();
1031
1032 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1033 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1034 // Check the overrides first
1035 if (ShadowPassed || OriginPassed) {
1036 CustomMapParams.AndMask = ClAndMask;
1037 CustomMapParams.XorMask = ClXorMask;
1038 CustomMapParams.ShadowBase = ClShadowBase;
1039 CustomMapParams.OriginBase = ClOriginBase;
1040 MapParams = &CustomMapParams;
1041 } else {
1042 switch (TargetTriple.getOS()) {
1043 case Triple::FreeBSD:
1044 switch (TargetTriple.getArch()) {
1045 case Triple::aarch64:
1046 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1047 break;
1048 case Triple::x86_64:
1049 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1050 break;
1051 case Triple::x86:
1052 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1053 break;
1054 default:
1055 report_fatal_error("unsupported architecture");
1056 }
1057 break;
1058 case Triple::NetBSD:
1059 switch (TargetTriple.getArch()) {
1060 case Triple::x86_64:
1061 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1062 break;
1063 default:
1064 report_fatal_error("unsupported architecture");
1065 }
1066 break;
1067 case Triple::Linux:
1068 switch (TargetTriple.getArch()) {
1069 case Triple::x86_64:
1070 MapParams = Linux_X86_MemoryMapParams.bits64;
1071 break;
1072 case Triple::x86:
1073 MapParams = Linux_X86_MemoryMapParams.bits32;
1074 break;
1075 case Triple::mips64:
1076 case Triple::mips64el:
1077 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1078 break;
1079 case Triple::ppc64:
1080 case Triple::ppc64le:
1081 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1082 break;
1083 case Triple::systemz:
1084 MapParams = Linux_S390_MemoryMapParams.bits64;
1085 break;
1086 case Triple::aarch64:
1087 case Triple::aarch64_be:
1088 MapParams = Linux_ARM_MemoryMapParams.bits64;
1089 break;
1091 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1092 break;
1093 default:
1094 report_fatal_error("unsupported architecture");
1095 }
1096 break;
1097 default:
1098 report_fatal_error("unsupported operating system");
1099 }
1100 }
1101
1102 C = &(M.getContext());
1103 IRBuilder<> IRB(*C);
1104 IntptrTy = IRB.getIntPtrTy(DL);
1105 OriginTy = IRB.getInt32Ty();
1106 PtrTy = IRB.getPtrTy();
1107
1108 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1110
1111 if (!CompileKernel) {
1112 if (TrackOrigins)
1113 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1114 return new GlobalVariable(
1115 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1116 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1117 });
1118
1119 if (Recover)
1120 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1121 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1122 GlobalValue::WeakODRLinkage,
1123 IRB.getInt32(Recover), "__msan_keep_going");
1124 });
1125 }
1126}
1127
1128namespace {
1129
1130/// A helper class that handles instrumentation of VarArg
1131/// functions on a particular platform.
1132///
1133/// Implementations are expected to insert the instrumentation
1134/// necessary to propagate argument shadow through VarArg function
1135/// calls. Visit* methods are called during an InstVisitor pass over
1136/// the function, and should avoid creating new basic blocks. A new
1137/// instance of this class is created for each instrumented function.
1138struct VarArgHelper {
1139 virtual ~VarArgHelper() = default;
1140
1141 /// Visit a CallBase.
1142 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1143
1144 /// Visit a va_start call.
1145 virtual void visitVAStartInst(VAStartInst &I) = 0;
1146
1147 /// Visit a va_copy call.
1148 virtual void visitVACopyInst(VACopyInst &I) = 0;
1149
1150 /// Finalize function instrumentation.
1151 ///
1152 /// This method is called after visiting all interesting (see above)
1153 /// instructions in a function.
1154 virtual void finalizeInstrumentation() = 0;
1155};
1156
1157struct MemorySanitizerVisitor;
1158
1159} // end anonymous namespace
1160
1161static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1162 MemorySanitizerVisitor &Visitor);
1163
1164static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1165 if (TS.isScalable())
1166 // Scalable types unconditionally take slowpaths.
1167 return kNumberOfAccessSizes;
1168 unsigned TypeSizeFixed = TS.getFixedValue();
1169 if (TypeSizeFixed <= 8)
1170 return 0;
1171 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1172}
1173
1174namespace {
1175
1176/// Helper class to attach debug information of the given instruction onto new
1177/// instructions inserted after.
1178class NextNodeIRBuilder : public IRBuilder<> {
1179public:
1180 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1181 SetCurrentDebugLocation(IP->getDebugLoc());
1182 }
1183};
1184
1185/// This class does all the work for a given function. Store and Load
1186/// instructions store and load corresponding shadow and origin
1187/// values. Most instructions propagate shadow from arguments to their
1188/// return values. Certain instructions (most importantly, BranchInst)
1189/// test their argument shadow and print reports (with a runtime call) if it's
1190/// non-zero.
1191struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1192 Function &F;
1193 MemorySanitizer &MS;
1194 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1195 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1196 std::unique_ptr<VarArgHelper> VAHelper;
1197 const TargetLibraryInfo *TLI;
1198 Instruction *FnPrologueEnd;
1199 SmallVector<Instruction *, 16> Instructions;
1200
1201 // The following flags disable parts of MSan instrumentation based on
1202 // exclusion list contents and command-line options.
1203 bool InsertChecks;
1204 bool PropagateShadow;
1205 bool PoisonStack;
1206 bool PoisonUndef;
1207 bool PoisonUndefVectors;
1208
1209 struct ShadowOriginAndInsertPoint {
1210 Value *Shadow;
1211 Value *Origin;
1212 Instruction *OrigIns;
1213
1214 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1215 : Shadow(S), Origin(O), OrigIns(I) {}
1216 };
1218 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1219 SmallSetVector<AllocaInst *, 16> AllocaSet;
1222 int64_t SplittableBlocksCount = 0;
1223
1224 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1225 const TargetLibraryInfo &TLI)
1226 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1227 bool SanitizeFunction =
1228 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1229 InsertChecks = SanitizeFunction;
1230 PropagateShadow = SanitizeFunction;
1231 PoisonStack = SanitizeFunction && ClPoisonStack;
1232 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1233 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1234
1235 // In the presence of unreachable blocks, we may see Phi nodes with
1236 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1237 // blocks, such nodes will not have any shadow value associated with them.
1238 // It's easier to remove unreachable blocks than deal with missing shadow.
1240
1241 MS.initializeCallbacks(*F.getParent(), TLI);
1242 FnPrologueEnd =
1243 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1244 .CreateIntrinsic(Intrinsic::donothing, {});
1245
1246 if (MS.CompileKernel) {
1247 IRBuilder<> IRB(FnPrologueEnd);
1248 insertKmsanPrologue(IRB);
1249 }
1250
1251 LLVM_DEBUG(if (!InsertChecks) dbgs()
1252 << "MemorySanitizer is not inserting checks into '"
1253 << F.getName() << "'\n");
1254 }
1255
1256 bool instrumentWithCalls(Value *V) {
1257 // Constants likely will be eliminated by follow-up passes.
1258 if (isa<Constant>(V))
1259 return false;
1260 ++SplittableBlocksCount;
1262 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1263 }
1264
1265 bool isInPrologue(Instruction &I) {
1266 return I.getParent() == FnPrologueEnd->getParent() &&
1267 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1268 }
1269
1270 // Creates a new origin and records the stack trace. In general we can call
1271 // this function for any origin manipulation we like. However it will cost
1272 // runtime resources. So use this wisely only if it can provide additional
1273 // information helpful to a user.
1274 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1275 if (MS.TrackOrigins <= 1)
1276 return V;
1277 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1278 }
1279
1280 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1281 const DataLayout &DL = F.getDataLayout();
1282 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1283 if (IntptrSize == kOriginSize)
1284 return Origin;
1285 assert(IntptrSize == kOriginSize * 2);
1286 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1287 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1288 }
1289
1290 /// Fill memory range with the given origin value.
1291 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1292 TypeSize TS, Align Alignment) {
1293 const DataLayout &DL = F.getDataLayout();
1294 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1295 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1296 assert(IntptrAlignment >= kMinOriginAlignment);
1297 assert(IntptrSize >= kOriginSize);
1298
1299 // Note: The loop based formation works for fixed length vectors too,
1300 // however we prefer to unroll and specialize alignment below.
1301 if (TS.isScalable()) {
1302 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1303 Value *RoundUp =
1304 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1305 Value *End =
1306 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1307 auto [InsertPt, Index] =
1309 IRB.SetInsertPoint(InsertPt);
1310
1311 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1313 return;
1314 }
1315
1316 unsigned Size = TS.getFixedValue();
1317
1318 unsigned Ofs = 0;
1319 Align CurrentAlignment = Alignment;
1320 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1321 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1322 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1323 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1324 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1325 : IntptrOriginPtr;
1326 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1327 Ofs += IntptrSize / kOriginSize;
1328 CurrentAlignment = IntptrAlignment;
1329 }
1330 }
1331
1332 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1333 Value *GEP =
1334 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1335 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1336 CurrentAlignment = kMinOriginAlignment;
1337 }
1338 }
1339
1340 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1341 Value *OriginPtr, Align Alignment) {
1342 const DataLayout &DL = F.getDataLayout();
1343 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1344 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1345 // ZExt cannot convert between vector and scalar
1346 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1347 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1348 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1349 // Origin is not needed: value is initialized or const shadow is
1350 // ignored.
1351 return;
1352 }
1353 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1354 // Copy origin as the value is definitely uninitialized.
1355 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1356 OriginAlignment);
1357 return;
1358 }
1359 // Fallback to runtime check, which still can be optimized out later.
1360 }
1361
1362 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1363 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1364 if (instrumentWithCalls(ConvertedShadow) &&
1365 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1366 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1367 Value *ConvertedShadow2 =
1368 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1369 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1370 CB->addParamAttr(0, Attribute::ZExt);
1371 CB->addParamAttr(2, Attribute::ZExt);
1372 } else {
1373 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1375 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1376 IRBuilder<> IRBNew(CheckTerm);
1377 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1378 OriginAlignment);
1379 }
1380 }
1381
1382 void materializeStores() {
1383 for (StoreInst *SI : StoreList) {
1384 IRBuilder<> IRB(SI);
1385 Value *Val = SI->getValueOperand();
1386 Value *Addr = SI->getPointerOperand();
1387 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1388 Value *ShadowPtr, *OriginPtr;
1389 Type *ShadowTy = Shadow->getType();
1390 const Align Alignment = SI->getAlign();
1391 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1392 std::tie(ShadowPtr, OriginPtr) =
1393 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1394
1395 [[maybe_unused]] StoreInst *NewSI =
1396 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1397 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1398
1399 if (SI->isAtomic())
1400 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1401
1402 if (MS.TrackOrigins && !SI->isAtomic())
1403 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1404 OriginAlignment);
1405 }
1406 }
1407
1408 // Returns true if Debug Location corresponds to multiple warnings.
1409 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1410 if (MS.TrackOrigins < 2)
1411 return false;
1412
1413 if (LazyWarningDebugLocationCount.empty())
1414 for (const auto &I : InstrumentationList)
1415 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1416
1417 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1418 }
1419
1420 /// Helper function to insert a warning at IRB's current insert point.
1421 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1422 if (!Origin)
1423 Origin = (Value *)IRB.getInt32(0);
1424 assert(Origin->getType()->isIntegerTy());
1425
1426 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1427 // Try to create additional origin with debug info of the last origin
1428 // instruction. It may provide additional information to the user.
1429 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1430 assert(MS.TrackOrigins);
1431 auto NewDebugLoc = OI->getDebugLoc();
1432 // Origin update with missing or the same debug location provides no
1433 // additional value.
1434 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1435 // Insert update just before the check, so we call runtime only just
1436 // before the report.
1437 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1438 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1439 Origin = updateOrigin(Origin, IRBOrigin);
1440 }
1441 }
1442 }
1443
1444 if (MS.CompileKernel || MS.TrackOrigins)
1445 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1446 else
1447 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1448 // FIXME: Insert UnreachableInst if !MS.Recover?
1449 // This may invalidate some of the following checks and needs to be done
1450 // at the very end.
1451 }
1452
1453 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1454 Value *Origin) {
1455 const DataLayout &DL = F.getDataLayout();
1456 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1457 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1458 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1459 // ZExt cannot convert between vector and scalar
1460 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1461 Value *ConvertedShadow2 =
1462 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1463
1464 if (SizeIndex < kNumberOfAccessSizes) {
1465 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1466 CallBase *CB = IRB.CreateCall(
1467 Fn,
1468 {ConvertedShadow2,
1469 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1470 CB->addParamAttr(0, Attribute::ZExt);
1471 CB->addParamAttr(1, Attribute::ZExt);
1472 } else {
1473 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1474 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1475 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1476 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1477 CallBase *CB = IRB.CreateCall(
1478 Fn,
1479 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1480 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1481 CB->addParamAttr(1, Attribute::ZExt);
1482 CB->addParamAttr(2, Attribute::ZExt);
1483 }
1484 } else {
1485 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1487 Cmp, &*IRB.GetInsertPoint(),
1488 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1489
1490 IRB.SetInsertPoint(CheckTerm);
1491 insertWarningFn(IRB, Origin);
1492 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1493 }
1494 }
1495
1496 void materializeInstructionChecks(
1497 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1498 const DataLayout &DL = F.getDataLayout();
1499 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1500 // correct origin.
1501 bool Combine = !MS.TrackOrigins;
1502 Instruction *Instruction = InstructionChecks.front().OrigIns;
1503 Value *Shadow = nullptr;
1504 for (const auto &ShadowData : InstructionChecks) {
1505 assert(ShadowData.OrigIns == Instruction);
1506 IRBuilder<> IRB(Instruction);
1507
1508 Value *ConvertedShadow = ShadowData.Shadow;
1509
1510 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1511 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1512 // Skip, value is initialized or const shadow is ignored.
1513 continue;
1514 }
1515 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1516 // Report as the value is definitely uninitialized.
1517 insertWarningFn(IRB, ShadowData.Origin);
1518 if (!MS.Recover)
1519 return; // Always fail and stop here, not need to check the rest.
1520 // Skip entire instruction,
1521 continue;
1522 }
1523 // Fallback to runtime check, which still can be optimized out later.
1524 }
1525
1526 if (!Combine) {
1527 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1528 continue;
1529 }
1530
1531 if (!Shadow) {
1532 Shadow = ConvertedShadow;
1533 continue;
1534 }
1535
1536 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1537 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1538 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1539 }
1540
1541 if (Shadow) {
1542 assert(Combine);
1543 IRBuilder<> IRB(Instruction);
1544 materializeOneCheck(IRB, Shadow, nullptr);
1545 }
1546 }
1547
1548 static bool isAArch64SVCount(Type *Ty) {
1549 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1550 return TTy->getName() == "aarch64.svcount";
1551 return false;
1552 }
1553
1554 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1555 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1556 static bool isScalableNonVectorType(Type *Ty) {
1557 if (!isAArch64SVCount(Ty))
1558 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1559 << "\n");
1560
1561 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1562 }
1563
1564 void materializeChecks() {
1565#ifndef NDEBUG
1566 // For assert below.
1567 SmallPtrSet<Instruction *, 16> Done;
1568#endif
1569
1570 for (auto I = InstrumentationList.begin();
1571 I != InstrumentationList.end();) {
1572 auto OrigIns = I->OrigIns;
1573 // Checks are grouped by the original instruction. We call all
1574 // `insertShadowCheck` for an instruction at once.
1575 assert(Done.insert(OrigIns).second);
1576 auto J = std::find_if(I + 1, InstrumentationList.end(),
1577 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1578 return OrigIns != R.OrigIns;
1579 });
1580 // Process all checks of instruction at once.
1581 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1582 I = J;
1583 }
1584
1585 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1586 }
1587
1588 // Returns the last instruction in the new prologue
1589 void insertKmsanPrologue(IRBuilder<> &IRB) {
1590 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1591 Constant *Zero = IRB.getInt32(0);
1592 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1593 {Zero, IRB.getInt32(0)}, "param_shadow");
1594 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1595 {Zero, IRB.getInt32(1)}, "retval_shadow");
1596 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1597 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1598 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1599 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1600 MS.VAArgOverflowSizeTLS =
1601 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1602 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1603 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1604 {Zero, IRB.getInt32(5)}, "param_origin");
1605 MS.RetvalOriginTLS =
1606 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1607 {Zero, IRB.getInt32(6)}, "retval_origin");
1608 if (MS.TargetTriple.getArch() == Triple::systemz)
1609 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1610 }
1611
1612 /// Add MemorySanitizer instrumentation to a function.
1613 bool runOnFunction() {
1614 // Iterate all BBs in depth-first order and create shadow instructions
1615 // for all instructions (where applicable).
1616 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1617 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1618 visit(*BB);
1619
1620 // `visit` above only collects instructions. Process them after iterating
1621 // CFG to avoid requirement on CFG transformations.
1622 for (Instruction *I : Instructions)
1624
1625 // Finalize PHI nodes.
1626 for (PHINode *PN : ShadowPHINodes) {
1627 PHINode *PNS = cast<PHINode>(getShadow(PN));
1628 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1629 size_t NumValues = PN->getNumIncomingValues();
1630 for (size_t v = 0; v < NumValues; v++) {
1631 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1632 if (PNO)
1633 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1634 }
1635 }
1636
1637 VAHelper->finalizeInstrumentation();
1638
1639 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1640 // instrumenting only allocas.
1642 for (auto Item : LifetimeStartList) {
1643 instrumentAlloca(*Item.second, Item.first);
1644 AllocaSet.remove(Item.second);
1645 }
1646 }
1647 // Poison the allocas for which we didn't instrument the corresponding
1648 // lifetime intrinsics.
1649 for (AllocaInst *AI : AllocaSet)
1650 instrumentAlloca(*AI);
1651
1652 // Insert shadow value checks.
1653 materializeChecks();
1654
1655 // Delayed instrumentation of StoreInst.
1656 // This may not add new address checks.
1657 materializeStores();
1658
1659 return true;
1660 }
1661
1662 /// Compute the shadow type that corresponds to a given Value.
1663 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1664
1665 /// Compute the shadow type that corresponds to a given Type.
1666 Type *getShadowTy(Type *OrigTy) {
1667 if (!OrigTy->isSized()) {
1668 return nullptr;
1669 }
1670 // For integer type, shadow is the same as the original type.
1671 // This may return weird-sized types like i1.
1672 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1673 return IT;
1674 const DataLayout &DL = F.getDataLayout();
1675 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1676 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1677 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1678 VT->getElementCount());
1679 }
1680 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1681 return ArrayType::get(getShadowTy(AT->getElementType()),
1682 AT->getNumElements());
1683 }
1684 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1686 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1687 Elements.push_back(getShadowTy(ST->getElementType(i)));
1688 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1689 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1690 return Res;
1691 }
1692 if (isScalableNonVectorType(OrigTy)) {
1693 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1694 << "\n");
1695 return OrigTy;
1696 }
1697
1698 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1699 return IntegerType::get(*MS.C, TypeSize);
1700 }
1701
1702 /// Extract combined shadow of struct elements as a bool
1703 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1704 IRBuilder<> &IRB) {
1705 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1706 Value *Aggregator = FalseVal;
1707
1708 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1709 // Combine by ORing together each element's bool shadow
1710 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1711 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1712
1713 if (Aggregator != FalseVal)
1714 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1715 else
1716 Aggregator = ShadowBool;
1717 }
1718
1719 return Aggregator;
1720 }
1721
1722 // Extract combined shadow of array elements
1723 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1724 IRBuilder<> &IRB) {
1725 if (!Array->getNumElements())
1726 return IRB.getIntN(/* width */ 1, /* value */ 0);
1727
1728 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1729 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1730
1731 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1732 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1733 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1734 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1735 }
1736 return Aggregator;
1737 }
1738
1739 /// Convert a shadow value to it's flattened variant. The resulting
1740 /// shadow may not necessarily have the same bit width as the input
1741 /// value, but it will always be comparable to zero.
1742 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1743 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1744 return collapseStructShadow(Struct, V, IRB);
1745 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1746 return collapseArrayShadow(Array, V, IRB);
1747 if (isa<VectorType>(V->getType())) {
1748 if (isa<ScalableVectorType>(V->getType()))
1749 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1750 unsigned BitWidth =
1751 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1752 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1753 }
1754 return V;
1755 }
1756
1757 // Convert a scalar value to an i1 by comparing with 0
1758 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1759 Type *VTy = V->getType();
1760 if (!VTy->isIntegerTy())
1761 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1762 if (VTy->getIntegerBitWidth() == 1)
1763 // Just converting a bool to a bool, so do nothing.
1764 return V;
1765 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1766 }
1767
1768 Type *ptrToIntPtrType(Type *PtrTy) const {
1769 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1770 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1771 VectTy->getElementCount());
1772 }
1773 assert(PtrTy->isIntOrPtrTy());
1774 return MS.IntptrTy;
1775 }
1776
1777 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1778 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1779 return VectorType::get(
1780 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1781 VectTy->getElementCount());
1782 }
1783 assert(IntPtrTy == MS.IntptrTy);
1784 return MS.PtrTy;
1785 }
1786
1787 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1788 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1790 VectTy->getElementCount(),
1791 constToIntPtr(VectTy->getElementType(), C));
1792 }
1793 assert(IntPtrTy == MS.IntptrTy);
1794 return ConstantInt::get(MS.IntptrTy, C);
1795 }
1796
1797 /// Returns the integer shadow offset that corresponds to a given
1798 /// application address, whereby:
1799 ///
1800 /// Offset = (Addr & ~AndMask) ^ XorMask
1801 /// Shadow = ShadowBase + Offset
1802 /// Origin = (OriginBase + Offset) & ~Alignment
1803 ///
1804 /// Note: for efficiency, many shadow mappings only require use the XorMask
1805 /// and OriginBase; the AndMask and ShadowBase are often zero.
1806 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1807 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1808 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1809
1810 if (uint64_t AndMask = MS.MapParams->AndMask)
1811 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1812
1813 if (uint64_t XorMask = MS.MapParams->XorMask)
1814 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1815 return OffsetLong;
1816 }
1817
1818 /// Compute the shadow and origin addresses corresponding to a given
1819 /// application address.
1820 ///
1821 /// Shadow = ShadowBase + Offset
1822 /// Origin = (OriginBase + Offset) & ~3ULL
1823 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1824 /// a single pointee.
1825 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1826 std::pair<Value *, Value *>
1827 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1828 MaybeAlign Alignment) {
1829 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1830 if (!VectTy) {
1831 assert(Addr->getType()->isPointerTy());
1832 } else {
1833 assert(VectTy->getElementType()->isPointerTy());
1834 }
1835 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1836 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1837 Value *ShadowLong = ShadowOffset;
1838 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1839 ShadowLong =
1840 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1841 }
1842 Value *ShadowPtr = IRB.CreateIntToPtr(
1843 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1844
1845 Value *OriginPtr = nullptr;
1846 if (MS.TrackOrigins) {
1847 Value *OriginLong = ShadowOffset;
1848 uint64_t OriginBase = MS.MapParams->OriginBase;
1849 if (OriginBase != 0)
1850 OriginLong =
1851 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1852 if (!Alignment || *Alignment < kMinOriginAlignment) {
1853 uint64_t Mask = kMinOriginAlignment.value() - 1;
1854 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1855 }
1856 OriginPtr = IRB.CreateIntToPtr(
1857 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1858 }
1859 return std::make_pair(ShadowPtr, OriginPtr);
1860 }
1861
1862 template <typename... ArgsTy>
1863 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1864 ArgsTy... Args) {
1865 if (MS.TargetTriple.getArch() == Triple::systemz) {
1866 IRB.CreateCall(Callee,
1867 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1868 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1869 }
1870
1871 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1872 }
1873
1874 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1875 IRBuilder<> &IRB,
1876 Type *ShadowTy,
1877 bool isStore) {
1878 Value *ShadowOriginPtrs;
1879 const DataLayout &DL = F.getDataLayout();
1880 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1881
1882 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1883 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1884 if (Getter) {
1885 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1886 } else {
1887 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1888 ShadowOriginPtrs = createMetadataCall(
1889 IRB,
1890 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1891 AddrCast, SizeVal);
1892 }
1893 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1894 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1895 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1896
1897 return std::make_pair(ShadowPtr, OriginPtr);
1898 }
1899
1900 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1901 /// a single pointee.
1902 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1903 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1904 IRBuilder<> &IRB,
1905 Type *ShadowTy,
1906 bool isStore) {
1907 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1908 if (!VectTy) {
1909 assert(Addr->getType()->isPointerTy());
1910 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1911 }
1912
1913 // TODO: Support callbacs with vectors of addresses.
1914 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1915 Value *ShadowPtrs = ConstantInt::getNullValue(
1916 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1917 Value *OriginPtrs = nullptr;
1918 if (MS.TrackOrigins)
1919 OriginPtrs = ConstantInt::getNullValue(
1920 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1921 for (unsigned i = 0; i < NumElements; ++i) {
1922 Value *OneAddr =
1923 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1924 auto [ShadowPtr, OriginPtr] =
1925 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1926
1927 ShadowPtrs = IRB.CreateInsertElement(
1928 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1929 if (MS.TrackOrigins)
1930 OriginPtrs = IRB.CreateInsertElement(
1931 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1932 }
1933 return {ShadowPtrs, OriginPtrs};
1934 }
1935
1936 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1937 Type *ShadowTy,
1938 MaybeAlign Alignment,
1939 bool isStore) {
1940 if (MS.CompileKernel)
1941 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1942 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1943 }
1944
1945 /// Compute the shadow address for a given function argument.
1946 ///
1947 /// Shadow = ParamTLS+ArgOffset.
1948 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1949 return IRB.CreatePtrAdd(MS.ParamTLS,
1950 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1951 }
1952
1953 /// Compute the origin address for a given function argument.
1954 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1955 if (!MS.TrackOrigins)
1956 return nullptr;
1957 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1958 ConstantInt::get(MS.IntptrTy, ArgOffset),
1959 "_msarg_o");
1960 }
1961
1962 /// Compute the shadow address for a retval.
1963 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1964 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1965 }
1966
1967 /// Compute the origin address for a retval.
1968 Value *getOriginPtrForRetval() {
1969 // We keep a single origin for the entire retval. Might be too optimistic.
1970 return MS.RetvalOriginTLS;
1971 }
1972
1973 /// Set SV to be the shadow value for V.
1974 void setShadow(Value *V, Value *SV) {
1975 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1976 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1977 }
1978
1979 /// Set Origin to be the origin value for V.
1980 void setOrigin(Value *V, Value *Origin) {
1981 if (!MS.TrackOrigins)
1982 return;
1983 assert(!OriginMap.count(V) && "Values may only have one origin");
1984 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1985 OriginMap[V] = Origin;
1986 }
1987
1988 Constant *getCleanShadow(Type *OrigTy) {
1989 Type *ShadowTy = getShadowTy(OrigTy);
1990 if (!ShadowTy)
1991 return nullptr;
1992 return Constant::getNullValue(ShadowTy);
1993 }
1994
1995 /// Create a clean shadow value for a given value.
1996 ///
1997 /// Clean shadow (all zeroes) means all bits of the value are defined
1998 /// (initialized).
1999 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2000
2001 /// Create a dirty shadow of a given shadow type.
2002 Constant *getPoisonedShadow(Type *ShadowTy) {
2003 assert(ShadowTy);
2004 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2005 return Constant::getAllOnesValue(ShadowTy);
2006 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2007 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2008 getPoisonedShadow(AT->getElementType()));
2009 return ConstantArray::get(AT, Vals);
2010 }
2011 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2013 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2014 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2015 return ConstantStruct::get(ST, Vals);
2016 }
2017 llvm_unreachable("Unexpected shadow type");
2018 }
2019
2020 /// Create a dirty shadow for a given value.
2021 Constant *getPoisonedShadow(Value *V) {
2022 Type *ShadowTy = getShadowTy(V);
2023 if (!ShadowTy)
2024 return nullptr;
2025 return getPoisonedShadow(ShadowTy);
2026 }
2027
2028 /// Create a clean (zero) origin.
2029 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2030
2031 /// Get the shadow value for a given Value.
2032 ///
2033 /// This function either returns the value set earlier with setShadow,
2034 /// or extracts if from ParamTLS (for function arguments).
2035 Value *getShadow(Value *V) {
2036 if (Instruction *I = dyn_cast<Instruction>(V)) {
2037 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2038 return getCleanShadow(V);
2039 // For instructions the shadow is already stored in the map.
2040 Value *Shadow = ShadowMap[V];
2041 if (!Shadow) {
2042 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2043 assert(Shadow && "No shadow for a value");
2044 }
2045 return Shadow;
2046 }
2047 // Handle fully undefined values
2048 // (partially undefined constant vectors are handled later)
2049 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2050 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2051 : getCleanShadow(V);
2052 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2053 return AllOnes;
2054 }
2055 if (Argument *A = dyn_cast<Argument>(V)) {
2056 // For arguments we compute the shadow on demand and store it in the map.
2057 Value *&ShadowPtr = ShadowMap[V];
2058 if (ShadowPtr)
2059 return ShadowPtr;
2060 Function *F = A->getParent();
2061 IRBuilder<> EntryIRB(FnPrologueEnd);
2062 unsigned ArgOffset = 0;
2063 const DataLayout &DL = F->getDataLayout();
2064 for (auto &FArg : F->args()) {
2065 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2066 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2067 ? "vscale not fully supported\n"
2068 : "Arg is not sized\n"));
2069 if (A == &FArg) {
2070 ShadowPtr = getCleanShadow(V);
2071 setOrigin(A, getCleanOrigin());
2072 break;
2073 }
2074 continue;
2075 }
2076
2077 unsigned Size = FArg.hasByValAttr()
2078 ? DL.getTypeAllocSize(FArg.getParamByValType())
2079 : DL.getTypeAllocSize(FArg.getType());
2080
2081 if (A == &FArg) {
2082 bool Overflow = ArgOffset + Size > kParamTLSSize;
2083 if (FArg.hasByValAttr()) {
2084 // ByVal pointer itself has clean shadow. We copy the actual
2085 // argument shadow to the underlying memory.
2086 // Figure out maximal valid memcpy alignment.
2087 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2088 FArg.getParamAlign(), FArg.getParamByValType());
2089 Value *CpShadowPtr, *CpOriginPtr;
2090 std::tie(CpShadowPtr, CpOriginPtr) =
2091 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2092 /*isStore*/ true);
2093 if (!PropagateShadow || Overflow) {
2094 // ParamTLS overflow.
2095 EntryIRB.CreateMemSet(
2096 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2097 Size, ArgAlign);
2098 } else {
2099 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2100 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2101 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2102 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2103 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2104
2105 if (MS.TrackOrigins) {
2106 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2107 // FIXME: OriginSize should be:
2108 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2109 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2110 EntryIRB.CreateMemCpy(
2111 CpOriginPtr,
2112 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2113 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2114 OriginSize);
2115 }
2116 }
2117 }
2118
2119 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2120 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2121 ShadowPtr = getCleanShadow(V);
2122 setOrigin(A, getCleanOrigin());
2123 } else {
2124 // Shadow over TLS
2125 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2126 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2128 if (MS.TrackOrigins) {
2129 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2130 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2131 }
2132 }
2134 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2135 break;
2136 }
2137
2138 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2139 }
2140 assert(ShadowPtr && "Could not find shadow for an argument");
2141 return ShadowPtr;
2142 }
2143
2144 // Check for partially-undefined constant vectors
2145 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2146 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2147 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2148 PoisonUndefVectors) {
2149 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2150 SmallVector<Constant *, 32> ShadowVector(NumElems);
2151 for (unsigned i = 0; i != NumElems; ++i) {
2152 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2153 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2154 : getCleanShadow(Elem);
2155 }
2156
2157 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2158 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2159 << *ShadowConstant << "\n");
2160
2161 return ShadowConstant;
2162 }
2163
2164 // TODO: partially-undefined constant arrays, structures, and nested types
2165
2166 // For everything else the shadow is zero.
2167 return getCleanShadow(V);
2168 }
2169
2170 /// Get the shadow for i-th argument of the instruction I.
2171 Value *getShadow(Instruction *I, int i) {
2172 return getShadow(I->getOperand(i));
2173 }
2174
2175 /// Get the origin for a value.
2176 Value *getOrigin(Value *V) {
2177 if (!MS.TrackOrigins)
2178 return nullptr;
2179 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2180 return getCleanOrigin();
2182 "Unexpected value type in getOrigin()");
2183 if (Instruction *I = dyn_cast<Instruction>(V)) {
2184 if (I->getMetadata(LLVMContext::MD_nosanitize))
2185 return getCleanOrigin();
2186 }
2187 Value *Origin = OriginMap[V];
2188 assert(Origin && "Missing origin");
2189 return Origin;
2190 }
2191
2192 /// Get the origin for i-th argument of the instruction I.
2193 Value *getOrigin(Instruction *I, int i) {
2194 return getOrigin(I->getOperand(i));
2195 }
2196
2197 /// Remember the place where a shadow check should be inserted.
2198 ///
2199 /// This location will be later instrumented with a check that will print a
2200 /// UMR warning in runtime if the shadow value is not 0.
2201 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2202 assert(Shadow);
2203 if (!InsertChecks)
2204 return;
2205
2206 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2207 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2208 << *OrigIns << "\n");
2209 return;
2210 }
2211
2212 Type *ShadowTy = Shadow->getType();
2213 if (isScalableNonVectorType(ShadowTy)) {
2214 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2215 << " before " << *OrigIns << "\n");
2216 return;
2217 }
2218#ifndef NDEBUG
2219 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2220 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2221 "Can only insert checks for integer, vector, and aggregate shadow "
2222 "types");
2223#endif
2224 InstrumentationList.push_back(
2225 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2226 }
2227
2228 /// Get shadow for value, and remember the place where a shadow check should
2229 /// be inserted.
2230 ///
2231 /// This location will be later instrumented with a check that will print a
2232 /// UMR warning in runtime if the value is not fully defined.
2233 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2234 assert(Val);
2235 Value *Shadow, *Origin;
2237 Shadow = getShadow(Val);
2238 if (!Shadow)
2239 return;
2240 Origin = getOrigin(Val);
2241 } else {
2242 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2243 if (!Shadow)
2244 return;
2245 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2246 }
2247 insertCheckShadow(Shadow, Origin, OrigIns);
2248 }
2249
2251 switch (a) {
2252 case AtomicOrdering::NotAtomic:
2253 return AtomicOrdering::NotAtomic;
2254 case AtomicOrdering::Unordered:
2255 case AtomicOrdering::Monotonic:
2256 case AtomicOrdering::Release:
2257 return AtomicOrdering::Release;
2258 case AtomicOrdering::Acquire:
2259 case AtomicOrdering::AcquireRelease:
2260 return AtomicOrdering::AcquireRelease;
2261 case AtomicOrdering::SequentiallyConsistent:
2262 return AtomicOrdering::SequentiallyConsistent;
2263 }
2264 llvm_unreachable("Unknown ordering");
2265 }
2266
2267 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2268 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2269 uint32_t OrderingTable[NumOrderings] = {};
2270
2271 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2272 OrderingTable[(int)AtomicOrderingCABI::release] =
2273 (int)AtomicOrderingCABI::release;
2274 OrderingTable[(int)AtomicOrderingCABI::consume] =
2275 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2276 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2277 (int)AtomicOrderingCABI::acq_rel;
2278 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2279 (int)AtomicOrderingCABI::seq_cst;
2280
2281 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2282 }
2283
2285 switch (a) {
2286 case AtomicOrdering::NotAtomic:
2287 return AtomicOrdering::NotAtomic;
2288 case AtomicOrdering::Unordered:
2289 case AtomicOrdering::Monotonic:
2290 case AtomicOrdering::Acquire:
2291 return AtomicOrdering::Acquire;
2292 case AtomicOrdering::Release:
2293 case AtomicOrdering::AcquireRelease:
2294 return AtomicOrdering::AcquireRelease;
2295 case AtomicOrdering::SequentiallyConsistent:
2296 return AtomicOrdering::SequentiallyConsistent;
2297 }
2298 llvm_unreachable("Unknown ordering");
2299 }
2300
2301 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2302 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2303 uint32_t OrderingTable[NumOrderings] = {};
2304
2305 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2306 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2307 OrderingTable[(int)AtomicOrderingCABI::consume] =
2308 (int)AtomicOrderingCABI::acquire;
2309 OrderingTable[(int)AtomicOrderingCABI::release] =
2310 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2311 (int)AtomicOrderingCABI::acq_rel;
2312 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2313 (int)AtomicOrderingCABI::seq_cst;
2314
2315 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2316 }
2317
2318 // ------------------- Visitors.
2319 using InstVisitor<MemorySanitizerVisitor>::visit;
2320 void visit(Instruction &I) {
2321 if (I.getMetadata(LLVMContext::MD_nosanitize))
2322 return;
2323 // Don't want to visit if we're in the prologue
2324 if (isInPrologue(I))
2325 return;
2326 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2327 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2328 // We still need to set the shadow and origin to clean values.
2329 setShadow(&I, getCleanShadow(&I));
2330 setOrigin(&I, getCleanOrigin());
2331 return;
2332 }
2333
2334 Instructions.push_back(&I);
2335 }
2336
2337 /// Instrument LoadInst
2338 ///
2339 /// Loads the corresponding shadow and (optionally) origin.
2340 /// Optionally, checks that the load address is fully defined.
2341 void visitLoadInst(LoadInst &I) {
2342 assert(I.getType()->isSized() && "Load type must have size");
2343 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2344 NextNodeIRBuilder IRB(&I);
2345 Type *ShadowTy = getShadowTy(&I);
2346 Value *Addr = I.getPointerOperand();
2347 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2348 const Align Alignment = I.getAlign();
2349 if (PropagateShadow) {
2350 std::tie(ShadowPtr, OriginPtr) =
2351 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2352 setShadow(&I,
2353 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2354 } else {
2355 setShadow(&I, getCleanShadow(&I));
2356 }
2357
2359 insertCheckShadowOf(I.getPointerOperand(), &I);
2360
2361 if (I.isAtomic())
2362 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2363
2364 if (MS.TrackOrigins) {
2365 if (PropagateShadow) {
2366 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2367 setOrigin(
2368 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2369 } else {
2370 setOrigin(&I, getCleanOrigin());
2371 }
2372 }
2373 }
2374
2375 /// Instrument StoreInst
2376 ///
2377 /// Stores the corresponding shadow and (optionally) origin.
2378 /// Optionally, checks that the store address is fully defined.
2379 void visitStoreInst(StoreInst &I) {
2380 StoreList.push_back(&I);
2382 insertCheckShadowOf(I.getPointerOperand(), &I);
2383 }
2384
2385 void handleCASOrRMW(Instruction &I) {
2387
2388 IRBuilder<> IRB(&I);
2389 Value *Addr = I.getOperand(0);
2390 Value *Val = I.getOperand(1);
2391 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2392 /*isStore*/ true)
2393 .first;
2394
2396 insertCheckShadowOf(Addr, &I);
2397
2398 // Only test the conditional argument of cmpxchg instruction.
2399 // The other argument can potentially be uninitialized, but we can not
2400 // detect this situation reliably without possible false positives.
2402 insertCheckShadowOf(Val, &I);
2403
2404 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2405
2406 setShadow(&I, getCleanShadow(&I));
2407 setOrigin(&I, getCleanOrigin());
2408 }
2409
2410 void visitAtomicRMWInst(AtomicRMWInst &I) {
2411 handleCASOrRMW(I);
2412 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2413 }
2414
2415 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2416 handleCASOrRMW(I);
2417 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2418 }
2419
2420 // Vector manipulation.
2421 void visitExtractElementInst(ExtractElementInst &I) {
2422 insertCheckShadowOf(I.getOperand(1), &I);
2423 IRBuilder<> IRB(&I);
2424 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2425 "_msprop"));
2426 setOrigin(&I, getOrigin(&I, 0));
2427 }
2428
2429 void visitInsertElementInst(InsertElementInst &I) {
2430 insertCheckShadowOf(I.getOperand(2), &I);
2431 IRBuilder<> IRB(&I);
2432 auto *Shadow0 = getShadow(&I, 0);
2433 auto *Shadow1 = getShadow(&I, 1);
2434 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2435 "_msprop"));
2436 setOriginForNaryOp(I);
2437 }
2438
2439 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2440 IRBuilder<> IRB(&I);
2441 auto *Shadow0 = getShadow(&I, 0);
2442 auto *Shadow1 = getShadow(&I, 1);
2443 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2444 "_msprop"));
2445 setOriginForNaryOp(I);
2446 }
2447
2448 // Casts.
2449 void visitSExtInst(SExtInst &I) {
2450 IRBuilder<> IRB(&I);
2451 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2452 setOrigin(&I, getOrigin(&I, 0));
2453 }
2454
2455 void visitZExtInst(ZExtInst &I) {
2456 IRBuilder<> IRB(&I);
2457 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2458 setOrigin(&I, getOrigin(&I, 0));
2459 }
2460
2461 void visitTruncInst(TruncInst &I) {
2462 IRBuilder<> IRB(&I);
2463 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2464 setOrigin(&I, getOrigin(&I, 0));
2465 }
2466
2467 void visitBitCastInst(BitCastInst &I) {
2468 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2469 // a musttail call and a ret, don't instrument. New instructions are not
2470 // allowed after a musttail call.
2471 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2472 if (CI->isMustTailCall())
2473 return;
2474 IRBuilder<> IRB(&I);
2475 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2476 setOrigin(&I, getOrigin(&I, 0));
2477 }
2478
2479 void visitPtrToIntInst(PtrToIntInst &I) {
2480 IRBuilder<> IRB(&I);
2481 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2482 "_msprop_ptrtoint"));
2483 setOrigin(&I, getOrigin(&I, 0));
2484 }
2485
2486 void visitIntToPtrInst(IntToPtrInst &I) {
2487 IRBuilder<> IRB(&I);
2488 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2489 "_msprop_inttoptr"));
2490 setOrigin(&I, getOrigin(&I, 0));
2491 }
2492
2493 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2494 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2495 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2496 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2497 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2498 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2499
2500 /// Propagate shadow for bitwise AND.
2501 ///
2502 /// This code is exact, i.e. if, for example, a bit in the left argument
2503 /// is defined and 0, then neither the value not definedness of the
2504 /// corresponding bit in B don't affect the resulting shadow.
2505 void visitAnd(BinaryOperator &I) {
2506 IRBuilder<> IRB(&I);
2507 // "And" of 0 and a poisoned value results in unpoisoned value.
2508 // 1&1 => 1; 0&1 => 0; p&1 => p;
2509 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2510 // 1&p => p; 0&p => 0; p&p => p;
2511 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2512 Value *S1 = getShadow(&I, 0);
2513 Value *S2 = getShadow(&I, 1);
2514 Value *V1 = I.getOperand(0);
2515 Value *V2 = I.getOperand(1);
2516 if (V1->getType() != S1->getType()) {
2517 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2518 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2519 }
2520 Value *S1S2 = IRB.CreateAnd(S1, S2);
2521 Value *V1S2 = IRB.CreateAnd(V1, S2);
2522 Value *S1V2 = IRB.CreateAnd(S1, V2);
2523 setShadow(&I, IRB.CreateOr({S1S2, V1S2, S1V2}));
2524 setOriginForNaryOp(I);
2525 }
2526
2527 void visitOr(BinaryOperator &I) {
2528 IRBuilder<> IRB(&I);
2529 // "Or" of 1 and a poisoned value results in unpoisoned value:
2530 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2531 // 1|0 => 1; 0|0 => 0; p|0 => p;
2532 // 1|p => 1; 0|p => p; p|p => p;
2533 //
2534 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2535 //
2536 // If the "disjoint OR" property is violated, the result is poison, and
2537 // hence the entire shadow is uninitialized:
2538 // S = S | SignExt(V1 & V2 != 0)
2539 Value *S1 = getShadow(&I, 0);
2540 Value *S2 = getShadow(&I, 1);
2541 Value *V1 = I.getOperand(0);
2542 Value *V2 = I.getOperand(1);
2543 if (V1->getType() != S1->getType()) {
2544 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2545 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2546 }
2547
2548 Value *NotV1 = IRB.CreateNot(V1);
2549 Value *NotV2 = IRB.CreateNot(V2);
2550
2551 Value *S1S2 = IRB.CreateAnd(S1, S2);
2552 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2553 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2554
2555 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2556
2557 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2558 Value *V1V2 = IRB.CreateAnd(V1, V2);
2559 Value *DisjointOrShadow = IRB.CreateSExt(
2560 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2561 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2562 }
2563
2564 setShadow(&I, S);
2565 setOriginForNaryOp(I);
2566 }
2567
2568 /// Default propagation of shadow and/or origin.
2569 ///
2570 /// This class implements the general case of shadow propagation, used in all
2571 /// cases where we don't know and/or don't care about what the operation
2572 /// actually does. It converts all input shadow values to a common type
2573 /// (extending or truncating as necessary), and bitwise OR's them.
2574 ///
2575 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2576 /// fully initialized), and less prone to false positives.
2577 ///
2578 /// This class also implements the general case of origin propagation. For a
2579 /// Nary operation, result origin is set to the origin of an argument that is
2580 /// not entirely initialized. If there is more than one such arguments, the
2581 /// rightmost of them is picked. It does not matter which one is picked if all
2582 /// arguments are initialized.
2583 template <bool CombineShadow> class Combiner {
2584 Value *Shadow = nullptr;
2585 Value *Origin = nullptr;
2586 IRBuilder<> &IRB;
2587 MemorySanitizerVisitor *MSV;
2588
2589 public:
2590 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2591 : IRB(IRB), MSV(MSV) {}
2592
2593 /// Add a pair of shadow and origin values to the mix.
2594 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2595 if (CombineShadow) {
2596 assert(OpShadow);
2597 if (!Shadow)
2598 Shadow = OpShadow;
2599 else {
2600 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2601 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2602 }
2603 }
2604
2605 if (MSV->MS.TrackOrigins) {
2606 assert(OpOrigin);
2607 if (!Origin) {
2608 Origin = OpOrigin;
2609 } else {
2610 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2611 // No point in adding something that might result in 0 origin value.
2612 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2613 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2614 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2615 }
2616 }
2617 }
2618 return *this;
2619 }
2620
2621 /// Add an application value to the mix.
2622 Combiner &Add(Value *V) {
2623 Value *OpShadow = MSV->getShadow(V);
2624 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2625 return Add(OpShadow, OpOrigin);
2626 }
2627
2628 /// Set the current combined values as the given instruction's shadow
2629 /// and origin.
2630 void Done(Instruction *I) {
2631 if (CombineShadow) {
2632 assert(Shadow);
2633 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2634 MSV->setShadow(I, Shadow);
2635 }
2636 if (MSV->MS.TrackOrigins) {
2637 assert(Origin);
2638 MSV->setOrigin(I, Origin);
2639 }
2640 }
2641
2642 /// Store the current combined value at the specified origin
2643 /// location.
2644 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2645 if (MSV->MS.TrackOrigins) {
2646 assert(Origin);
2647 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2648 }
2649 }
2650 };
2651
2652 using ShadowAndOriginCombiner = Combiner<true>;
2653 using OriginCombiner = Combiner<false>;
2654
2655 /// Propagate origin for arbitrary operation.
2656 void setOriginForNaryOp(Instruction &I) {
2657 if (!MS.TrackOrigins)
2658 return;
2659 IRBuilder<> IRB(&I);
2660 OriginCombiner OC(this, IRB);
2661 for (Use &Op : I.operands())
2662 OC.Add(Op.get());
2663 OC.Done(&I);
2664 }
2665
2666 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2667 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2668 "Vector of pointers is not a valid shadow type");
2669 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2671 : Ty->getPrimitiveSizeInBits();
2672 }
2673
2674 /// Cast between two shadow types, extending or truncating as
2675 /// necessary.
2676 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2677 bool Signed = false) {
2678 Type *srcTy = V->getType();
2679 if (srcTy == dstTy)
2680 return V;
2681 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2682 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2683 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2684 return IRB.CreateICmpNE(V, getCleanShadow(V));
2685
2686 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2687 return IRB.CreateIntCast(V, dstTy, Signed);
2688 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2689 cast<VectorType>(dstTy)->getElementCount() ==
2690 cast<VectorType>(srcTy)->getElementCount())
2691 return IRB.CreateIntCast(V, dstTy, Signed);
2692 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2693 Value *V2 =
2694 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2695 return IRB.CreateBitCast(V2, dstTy);
2696 // TODO: handle struct types.
2697 }
2698
2699 /// Cast an application value to the type of its own shadow.
2700 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2701 Type *ShadowTy = getShadowTy(V);
2702 if (V->getType() == ShadowTy)
2703 return V;
2704 if (V->getType()->isPtrOrPtrVectorTy())
2705 return IRB.CreatePtrToInt(V, ShadowTy);
2706 else
2707 return IRB.CreateBitCast(V, ShadowTy);
2708 }
2709
2710 /// Propagate shadow for arbitrary operation.
2711 void handleShadowOr(Instruction &I) {
2712 IRBuilder<> IRB(&I);
2713 ShadowAndOriginCombiner SC(this, IRB);
2714 for (Use &Op : I.operands())
2715 SC.Add(Op.get());
2716 SC.Done(&I);
2717 }
2718
2719 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2720 // of elements.
2721 //
2722 // For example, suppose we have:
2723 // VectorA: <a1, a2, a3, a4, a5, a6>
2724 // VectorB: <b1, b2, b3, b4, b5, b6>
2725 // ReductionFactor: 3.
2726 // The output would be:
2727 // <a1|a2|a3, a4|a5|a6, b1|b2|b3, b4|b5|b6>
2728 //
2729 // This is convenient for instrumenting horizontal add/sub.
2730 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2731 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2732 Value *VectorA, Value *VectorB) {
2733 assert(isa<FixedVectorType>(VectorA->getType()));
2734 unsigned TotalNumElems =
2735 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2736
2737 if (VectorB) {
2738 assert(VectorA->getType() == VectorB->getType());
2739 TotalNumElems = TotalNumElems * 2;
2740 }
2741
2742 assert(TotalNumElems % ReductionFactor == 0);
2743
2744 Value *Or = nullptr;
2745
2746 IRBuilder<> IRB(&I);
2747 for (unsigned i = 0; i < ReductionFactor; i++) {
2748 SmallVector<int, 16> Mask;
2749 for (unsigned X = 0; X < TotalNumElems; X += ReductionFactor)
2750 Mask.push_back(X + i);
2751
2752 Value *Masked;
2753 if (VectorB)
2754 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2755 else
2756 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2757
2758 if (Or)
2759 Or = IRB.CreateOr(Or, Masked);
2760 else
2761 Or = Masked;
2762 }
2763
2764 return Or;
2765 }
2766
2767 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2768 /// fields.
2769 ///
2770 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2771 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2772 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I) {
2773 assert(I.arg_size() == 1 || I.arg_size() == 2);
2774
2775 assert(I.getType()->isVectorTy());
2776 assert(I.getArgOperand(0)->getType()->isVectorTy());
2777
2778 [[maybe_unused]] FixedVectorType *ParamType =
2779 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2780 assert((I.arg_size() != 2) ||
2781 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2782 [[maybe_unused]] FixedVectorType *ReturnType =
2783 cast<FixedVectorType>(I.getType());
2784 assert(ParamType->getNumElements() * I.arg_size() ==
2785 2 * ReturnType->getNumElements());
2786
2787 IRBuilder<> IRB(&I);
2788
2789 // Horizontal OR of shadow
2790 Value *FirstArgShadow = getShadow(&I, 0);
2791 Value *SecondArgShadow = nullptr;
2792 if (I.arg_size() == 2)
2793 SecondArgShadow = getShadow(&I, 1);
2794
2795 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2796 SecondArgShadow);
2797
2798 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2799
2800 setShadow(&I, OrShadow);
2801 setOriginForNaryOp(I);
2802 }
2803
2804 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2805 /// fields, with the parameters reinterpreted to have elements of a specified
2806 /// width. For example:
2807 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2808 /// conceptually operates on
2809 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2810 /// and can be handled with ReinterpretElemWidth == 16.
2811 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I,
2812 int ReinterpretElemWidth) {
2813 assert(I.arg_size() == 1 || I.arg_size() == 2);
2814
2815 assert(I.getType()->isVectorTy());
2816 assert(I.getArgOperand(0)->getType()->isVectorTy());
2817
2818 FixedVectorType *ParamType =
2819 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2820 assert((I.arg_size() != 2) ||
2821 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2822
2823 [[maybe_unused]] FixedVectorType *ReturnType =
2824 cast<FixedVectorType>(I.getType());
2825 assert(ParamType->getNumElements() * I.arg_size() ==
2826 2 * ReturnType->getNumElements());
2827
2828 IRBuilder<> IRB(&I);
2829
2830 FixedVectorType *ReinterpretShadowTy = nullptr;
2831 assert(isAligned(Align(ReinterpretElemWidth),
2832 ParamType->getPrimitiveSizeInBits()));
2833 ReinterpretShadowTy = FixedVectorType::get(
2834 IRB.getIntNTy(ReinterpretElemWidth),
2835 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2836
2837 // Horizontal OR of shadow
2838 Value *FirstArgShadow = getShadow(&I, 0);
2839 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2840
2841 // If we had two parameters each with an odd number of elements, the total
2842 // number of elements is even, but we have never seen this in extant
2843 // instruction sets, so we enforce that each parameter must have an even
2844 // number of elements.
2846 Align(2),
2847 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2848
2849 Value *SecondArgShadow = nullptr;
2850 if (I.arg_size() == 2) {
2851 SecondArgShadow = getShadow(&I, 1);
2852 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2853 }
2854
2855 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2856 SecondArgShadow);
2857
2858 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2859
2860 setShadow(&I, OrShadow);
2861 setOriginForNaryOp(I);
2862 }
2863
2864 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2865
2866 // Handle multiplication by constant.
2867 //
2868 // Handle a special case of multiplication by constant that may have one or
2869 // more zeros in the lower bits. This makes corresponding number of lower bits
2870 // of the result zero as well. We model it by shifting the other operand
2871 // shadow left by the required number of bits. Effectively, we transform
2872 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2873 // We use multiplication by 2**N instead of shift to cover the case of
2874 // multiplication by 0, which may occur in some elements of a vector operand.
2875 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2876 Value *OtherArg) {
2877 Constant *ShadowMul;
2878 Type *Ty = ConstArg->getType();
2879 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2880 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2881 Type *EltTy = VTy->getElementType();
2883 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2884 if (ConstantInt *Elt =
2886 const APInt &V = Elt->getValue();
2887 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2888 Elements.push_back(ConstantInt::get(EltTy, V2));
2889 } else {
2890 Elements.push_back(ConstantInt::get(EltTy, 1));
2891 }
2892 }
2893 ShadowMul = ConstantVector::get(Elements);
2894 } else {
2895 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2896 const APInt &V = Elt->getValue();
2897 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2898 ShadowMul = ConstantInt::get(Ty, V2);
2899 } else {
2900 ShadowMul = ConstantInt::get(Ty, 1);
2901 }
2902 }
2903
2904 IRBuilder<> IRB(&I);
2905 setShadow(&I,
2906 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2907 setOrigin(&I, getOrigin(OtherArg));
2908 }
2909
2910 void visitMul(BinaryOperator &I) {
2911 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2912 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2913 if (constOp0 && !constOp1)
2914 handleMulByConstant(I, constOp0, I.getOperand(1));
2915 else if (constOp1 && !constOp0)
2916 handleMulByConstant(I, constOp1, I.getOperand(0));
2917 else
2918 handleShadowOr(I);
2919 }
2920
2921 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2922 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2923 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2924 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2925 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2926 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2927
2928 void handleIntegerDiv(Instruction &I) {
2929 IRBuilder<> IRB(&I);
2930 // Strict on the second argument.
2931 insertCheckShadowOf(I.getOperand(1), &I);
2932 setShadow(&I, getShadow(&I, 0));
2933 setOrigin(&I, getOrigin(&I, 0));
2934 }
2935
2936 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2937 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2938 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2939 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2940
2941 // Floating point division is side-effect free. We can not require that the
2942 // divisor is fully initialized and must propagate shadow. See PR37523.
2943 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2944 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2945
2946 /// Instrument == and != comparisons.
2947 ///
2948 /// Sometimes the comparison result is known even if some of the bits of the
2949 /// arguments are not.
2950 void handleEqualityComparison(ICmpInst &I) {
2951 IRBuilder<> IRB(&I);
2952 Value *A = I.getOperand(0);
2953 Value *B = I.getOperand(1);
2954 Value *Sa = getShadow(A);
2955 Value *Sb = getShadow(B);
2956
2957 // Get rid of pointers and vectors of pointers.
2958 // For ints (and vectors of ints), types of A and Sa match,
2959 // and this is a no-op.
2960 A = IRB.CreatePointerCast(A, Sa->getType());
2961 B = IRB.CreatePointerCast(B, Sb->getType());
2962
2963 // A == B <==> (C = A^B) == 0
2964 // A != B <==> (C = A^B) != 0
2965 // Sc = Sa | Sb
2966 Value *C = IRB.CreateXor(A, B);
2967 Value *Sc = IRB.CreateOr(Sa, Sb);
2968 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2969 // Result is defined if one of the following is true
2970 // * there is a defined 1 bit in C
2971 // * C is fully defined
2972 // Si = !(C & ~Sc) && Sc
2974 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2975 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2976 Value *RHS =
2977 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2978 Value *Si = IRB.CreateAnd(LHS, RHS);
2979 Si->setName("_msprop_icmp");
2980 setShadow(&I, Si);
2981 setOriginForNaryOp(I);
2982 }
2983
2984 /// Instrument relational comparisons.
2985 ///
2986 /// This function does exact shadow propagation for all relational
2987 /// comparisons of integers, pointers and vectors of those.
2988 /// FIXME: output seems suboptimal when one of the operands is a constant
2989 void handleRelationalComparisonExact(ICmpInst &I) {
2990 IRBuilder<> IRB(&I);
2991 Value *A = I.getOperand(0);
2992 Value *B = I.getOperand(1);
2993 Value *Sa = getShadow(A);
2994 Value *Sb = getShadow(B);
2995
2996 // Get rid of pointers and vectors of pointers.
2997 // For ints (and vectors of ints), types of A and Sa match,
2998 // and this is a no-op.
2999 A = IRB.CreatePointerCast(A, Sa->getType());
3000 B = IRB.CreatePointerCast(B, Sb->getType());
3001
3002 // Let [a0, a1] be the interval of possible values of A, taking into account
3003 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3004 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3005 bool IsSigned = I.isSigned();
3006
3007 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3008 if (IsSigned) {
3009 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3010 // should be preserved, if checked with `getUnsignedPredicate()`.
3011 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3012 // affected, as they are created by effectively adding/substructing from
3013 // A (or B) a value, derived from shadow, with no overflow, either
3014 // before or after sign flip.
3015 APInt MinVal =
3016 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3017 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3018 }
3019 // Minimize undefined bits.
3020 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3021 Value *Max = IRB.CreateOr(V, S);
3022 return std::make_pair(Min, Max);
3023 };
3024
3025 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3026 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3027 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3028 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3029
3030 Value *Si = IRB.CreateXor(S1, S2);
3031 setShadow(&I, Si);
3032 setOriginForNaryOp(I);
3033 }
3034
3035 /// Instrument signed relational comparisons.
3036 ///
3037 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3038 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3039 void handleSignedRelationalComparison(ICmpInst &I) {
3040 Constant *constOp;
3041 Value *op = nullptr;
3043 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3044 op = I.getOperand(0);
3045 pre = I.getPredicate();
3046 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3047 op = I.getOperand(1);
3048 pre = I.getSwappedPredicate();
3049 } else {
3050 handleShadowOr(I);
3051 return;
3052 }
3053
3054 if ((constOp->isNullValue() &&
3055 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3056 (constOp->isAllOnesValue() &&
3057 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3058 IRBuilder<> IRB(&I);
3059 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3060 "_msprop_icmp_s");
3061 setShadow(&I, Shadow);
3062 setOrigin(&I, getOrigin(op));
3063 } else {
3064 handleShadowOr(I);
3065 }
3066 }
3067
3068 void visitICmpInst(ICmpInst &I) {
3069 if (!ClHandleICmp) {
3070 handleShadowOr(I);
3071 return;
3072 }
3073 if (I.isEquality()) {
3074 handleEqualityComparison(I);
3075 return;
3076 }
3077
3078 assert(I.isRelational());
3079 if (ClHandleICmpExact) {
3080 handleRelationalComparisonExact(I);
3081 return;
3082 }
3083 if (I.isSigned()) {
3084 handleSignedRelationalComparison(I);
3085 return;
3086 }
3087
3088 assert(I.isUnsigned());
3089 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3090 handleRelationalComparisonExact(I);
3091 return;
3092 }
3093
3094 handleShadowOr(I);
3095 }
3096
3097 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3098
3099 void handleShift(BinaryOperator &I) {
3100 IRBuilder<> IRB(&I);
3101 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3102 // Otherwise perform the same shift on S1.
3103 Value *S1 = getShadow(&I, 0);
3104 Value *S2 = getShadow(&I, 1);
3105 Value *S2Conv =
3106 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3107 Value *V2 = I.getOperand(1);
3108 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3109 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3110 setOriginForNaryOp(I);
3111 }
3112
3113 void visitShl(BinaryOperator &I) { handleShift(I); }
3114 void visitAShr(BinaryOperator &I) { handleShift(I); }
3115 void visitLShr(BinaryOperator &I) { handleShift(I); }
3116
3117 void handleFunnelShift(IntrinsicInst &I) {
3118 IRBuilder<> IRB(&I);
3119 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3120 // Otherwise perform the same shift on S0 and S1.
3121 Value *S0 = getShadow(&I, 0);
3122 Value *S1 = getShadow(&I, 1);
3123 Value *S2 = getShadow(&I, 2);
3124 Value *S2Conv =
3125 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3126 Value *V2 = I.getOperand(2);
3127 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3128 {S0, S1, V2});
3129 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3130 setOriginForNaryOp(I);
3131 }
3132
3133 /// Instrument llvm.memmove
3134 ///
3135 /// At this point we don't know if llvm.memmove will be inlined or not.
3136 /// If we don't instrument it and it gets inlined,
3137 /// our interceptor will not kick in and we will lose the memmove.
3138 /// If we instrument the call here, but it does not get inlined,
3139 /// we will memmove the shadow twice: which is bad in case
3140 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3141 ///
3142 /// Similar situation exists for memcpy and memset.
3143 void visitMemMoveInst(MemMoveInst &I) {
3144 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3145 IRBuilder<> IRB(&I);
3146 IRB.CreateCall(MS.MemmoveFn,
3147 {I.getArgOperand(0), I.getArgOperand(1),
3148 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3150 }
3151
3152 /// Instrument memcpy
3153 ///
3154 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3155 /// unfortunate as it may slowdown small constant memcpys.
3156 /// FIXME: consider doing manual inline for small constant sizes and proper
3157 /// alignment.
3158 ///
3159 /// Note: This also handles memcpy.inline, which promises no calls to external
3160 /// functions as an optimization. However, with instrumentation enabled this
3161 /// is difficult to promise; additionally, we know that the MSan runtime
3162 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3163 /// instrumentation it's safe to turn memcpy.inline into a call to
3164 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3165 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3166 void visitMemCpyInst(MemCpyInst &I) {
3167 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3168 IRBuilder<> IRB(&I);
3169 IRB.CreateCall(MS.MemcpyFn,
3170 {I.getArgOperand(0), I.getArgOperand(1),
3171 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3173 }
3174
3175 // Same as memcpy.
3176 void visitMemSetInst(MemSetInst &I) {
3177 IRBuilder<> IRB(&I);
3178 IRB.CreateCall(
3179 MS.MemsetFn,
3180 {I.getArgOperand(0),
3181 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3182 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3184 }
3185
3186 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3187
3188 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3189
3190 /// Handle vector store-like intrinsics.
3191 ///
3192 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3193 /// has 1 pointer argument and 1 vector argument, returns void.
3194 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3195 assert(I.arg_size() == 2);
3196
3197 IRBuilder<> IRB(&I);
3198 Value *Addr = I.getArgOperand(0);
3199 Value *Shadow = getShadow(&I, 1);
3200 Value *ShadowPtr, *OriginPtr;
3201
3202 // We don't know the pointer alignment (could be unaligned SSE store!).
3203 // Have to assume to worst case.
3204 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3205 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3206 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3207
3209 insertCheckShadowOf(Addr, &I);
3210
3211 // FIXME: factor out common code from materializeStores
3212 if (MS.TrackOrigins)
3213 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3214 return true;
3215 }
3216
3217 /// Handle vector load-like intrinsics.
3218 ///
3219 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3220 /// has 1 pointer argument, returns a vector.
3221 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3222 assert(I.arg_size() == 1);
3223
3224 IRBuilder<> IRB(&I);
3225 Value *Addr = I.getArgOperand(0);
3226
3227 Type *ShadowTy = getShadowTy(&I);
3228 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3229 if (PropagateShadow) {
3230 // We don't know the pointer alignment (could be unaligned SSE load!).
3231 // Have to assume to worst case.
3232 const Align Alignment = Align(1);
3233 std::tie(ShadowPtr, OriginPtr) =
3234 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3235 setShadow(&I,
3236 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3237 } else {
3238 setShadow(&I, getCleanShadow(&I));
3239 }
3240
3242 insertCheckShadowOf(Addr, &I);
3243
3244 if (MS.TrackOrigins) {
3245 if (PropagateShadow)
3246 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3247 else
3248 setOrigin(&I, getCleanOrigin());
3249 }
3250 return true;
3251 }
3252
3253 /// Handle (SIMD arithmetic)-like intrinsics.
3254 ///
3255 /// Instrument intrinsics with any number of arguments of the same type [*],
3256 /// equal to the return type, plus a specified number of trailing flags of
3257 /// any type.
3258 ///
3259 /// [*] The type should be simple (no aggregates or pointers; vectors are
3260 /// fine).
3261 ///
3262 /// Caller guarantees that this intrinsic does not access memory.
3263 ///
3264 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3265 /// by this handler. See horizontalReduce().
3266 ///
3267 /// TODO: permutation intrinsics are also often incorrectly matched.
3268 [[maybe_unused]] bool
3269 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3270 unsigned int trailingFlags) {
3271 Type *RetTy = I.getType();
3272 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3273 return false;
3274
3275 unsigned NumArgOperands = I.arg_size();
3276 assert(NumArgOperands >= trailingFlags);
3277 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3278 Type *Ty = I.getArgOperand(i)->getType();
3279 if (Ty != RetTy)
3280 return false;
3281 }
3282
3283 IRBuilder<> IRB(&I);
3284 ShadowAndOriginCombiner SC(this, IRB);
3285 for (unsigned i = 0; i < NumArgOperands; ++i)
3286 SC.Add(I.getArgOperand(i));
3287 SC.Done(&I);
3288
3289 return true;
3290 }
3291
3292 /// Returns whether it was able to heuristically instrument unknown
3293 /// intrinsics.
3294 ///
3295 /// The main purpose of this code is to do something reasonable with all
3296 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3297 /// We recognize several classes of intrinsics by their argument types and
3298 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3299 /// sure that we know what the intrinsic does.
3300 ///
3301 /// We special-case intrinsics where this approach fails. See llvm.bswap
3302 /// handling as an example of that.
3303 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3304 unsigned NumArgOperands = I.arg_size();
3305 if (NumArgOperands == 0)
3306 return false;
3307
3308 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3309 I.getArgOperand(1)->getType()->isVectorTy() &&
3310 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3311 // This looks like a vector store.
3312 return handleVectorStoreIntrinsic(I);
3313 }
3314
3315 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3316 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3317 // This looks like a vector load.
3318 return handleVectorLoadIntrinsic(I);
3319 }
3320
3321 if (I.doesNotAccessMemory())
3322 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3323 return true;
3324
3325 // FIXME: detect and handle SSE maskstore/maskload?
3326 // Some cases are now handled in handleAVXMasked{Load,Store}.
3327 return false;
3328 }
3329
3330 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3331 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3333 dumpInst(I);
3334
3335 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3336 << "\n");
3337 return true;
3338 } else
3339 return false;
3340 }
3341
3342 void handleInvariantGroup(IntrinsicInst &I) {
3343 setShadow(&I, getShadow(&I, 0));
3344 setOrigin(&I, getOrigin(&I, 0));
3345 }
3346
3347 void handleLifetimeStart(IntrinsicInst &I) {
3348 if (!PoisonStack)
3349 return;
3350 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3351 if (AI)
3352 LifetimeStartList.push_back(std::make_pair(&I, AI));
3353 }
3354
3355 void handleBswap(IntrinsicInst &I) {
3356 IRBuilder<> IRB(&I);
3357 Value *Op = I.getArgOperand(0);
3358 Type *OpType = Op->getType();
3359 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3360 getShadow(Op)));
3361 setOrigin(&I, getOrigin(Op));
3362 }
3363
3364 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3365 // and a 1. If the input is all zero, it is fully initialized iff
3366 // !is_zero_poison.
3367 //
3368 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3369 // concrete value 0/1, and ? is an uninitialized bit:
3370 // - 0001 0??? is fully initialized
3371 // - 000? ???? is fully uninitialized (*)
3372 // - ???? ???? is fully uninitialized
3373 // - 0000 0000 is fully uninitialized if is_zero_poison,
3374 // fully initialized otherwise
3375 //
3376 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3377 // only need to poison 4 bits.
3378 //
3379 // OutputShadow =
3380 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3381 // || (is_zero_poison && AllZeroSrc)
3382 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3383 IRBuilder<> IRB(&I);
3384 Value *Src = I.getArgOperand(0);
3385 Value *SrcShadow = getShadow(Src);
3386
3387 Value *False = IRB.getInt1(false);
3388 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3389 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3390 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3391 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3392
3393 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3394 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3395
3396 Value *NotAllZeroShadow =
3397 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3398 Value *OutputShadow =
3399 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3400
3401 // If zero poison is requested, mix in with the shadow
3402 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3403 if (!IsZeroPoison->isZeroValue()) {
3404 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3405 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3406 }
3407
3408 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3409
3410 setShadow(&I, OutputShadow);
3411 setOriginForNaryOp(I);
3412 }
3413
3414 /// Handle Arm NEON vector convert intrinsics.
3415 ///
3416 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3417 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64(double)
3418 ///
3419 /// For x86 SSE vector convert intrinsics, see
3420 /// handleSSEVectorConvertIntrinsic().
3421 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I) {
3422 assert(I.arg_size() == 1);
3423
3424 IRBuilder<> IRB(&I);
3425 Value *S0 = getShadow(&I, 0);
3426
3427 /// For scalars:
3428 /// Since they are converting from floating-point to integer, the output is
3429 /// - fully uninitialized if *any* bit of the input is uninitialized
3430 /// - fully ininitialized if all bits of the input are ininitialized
3431 /// We apply the same principle on a per-field basis for vectors.
3432 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3433 getShadowTy(&I));
3434 setShadow(&I, OutShadow);
3435 setOriginForNaryOp(I);
3436 }
3437
3438 /// Some instructions have additional zero-elements in the return type
3439 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3440 ///
3441 /// This function will return a vector type with the same number of elements
3442 /// as the input, but same per-element width as the return value e.g.,
3443 /// <8 x i8>.
3444 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3445 assert(isa<FixedVectorType>(getShadowTy(&I)));
3446 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3447
3448 // TODO: generalize beyond 2x?
3449 if (ShadowType->getElementCount() ==
3450 cast<VectorType>(Src->getType())->getElementCount() * 2)
3451 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3452
3453 assert(ShadowType->getElementCount() ==
3454 cast<VectorType>(Src->getType())->getElementCount());
3455
3456 return ShadowType;
3457 }
3458
3459 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3460 /// to match the length of the shadow for the instruction.
3461 /// If scalar types of the vectors are different, it will use the type of the
3462 /// input vector.
3463 /// This is more type-safe than CreateShadowCast().
3464 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3465 IRBuilder<> IRB(&I);
3467 assert(isa<FixedVectorType>(I.getType()));
3468
3469 Value *FullShadow = getCleanShadow(&I);
3470 unsigned ShadowNumElems =
3471 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3472 unsigned FullShadowNumElems =
3473 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3474
3475 assert((ShadowNumElems == FullShadowNumElems) ||
3476 (ShadowNumElems * 2 == FullShadowNumElems));
3477
3478 if (ShadowNumElems == FullShadowNumElems) {
3479 FullShadow = Shadow;
3480 } else {
3481 // TODO: generalize beyond 2x?
3482 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3483 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3484
3485 // Append zeros
3486 FullShadow =
3487 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3488 }
3489
3490 return FullShadow;
3491 }
3492
3493 /// Handle x86 SSE vector conversion.
3494 ///
3495 /// e.g., single-precision to half-precision conversion:
3496 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3497 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3498 ///
3499 /// floating-point to integer:
3500 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3501 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3502 ///
3503 /// Note: if the output has more elements, they are zero-initialized (and
3504 /// therefore the shadow will also be initialized).
3505 ///
3506 /// This differs from handleSSEVectorConvertIntrinsic() because it
3507 /// propagates uninitialized shadow (instead of checking the shadow).
3508 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3509 bool HasRoundingMode) {
3510 if (HasRoundingMode) {
3511 assert(I.arg_size() == 2);
3512 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3513 assert(RoundingMode->getType()->isIntegerTy());
3514 } else {
3515 assert(I.arg_size() == 1);
3516 }
3517
3518 Value *Src = I.getArgOperand(0);
3519 assert(Src->getType()->isVectorTy());
3520
3521 // The return type might have more elements than the input.
3522 // Temporarily shrink the return type's number of elements.
3523 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3524
3525 IRBuilder<> IRB(&I);
3526 Value *S0 = getShadow(&I, 0);
3527
3528 /// For scalars:
3529 /// Since they are converting to and/or from floating-point, the output is:
3530 /// - fully uninitialized if *any* bit of the input is uninitialized
3531 /// - fully ininitialized if all bits of the input are ininitialized
3532 /// We apply the same principle on a per-field basis for vectors.
3533 Value *Shadow =
3534 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3535
3536 // The return type might have more elements than the input.
3537 // Extend the return type back to its original width if necessary.
3538 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3539
3540 setShadow(&I, FullShadow);
3541 setOriginForNaryOp(I);
3542 }
3543
3544 // Instrument x86 SSE vector convert intrinsic.
3545 //
3546 // This function instruments intrinsics like cvtsi2ss:
3547 // %Out = int_xxx_cvtyyy(%ConvertOp)
3548 // or
3549 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3550 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3551 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3552 // elements from \p CopyOp.
3553 // In most cases conversion involves floating-point value which may trigger a
3554 // hardware exception when not fully initialized. For this reason we require
3555 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3556 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3557 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3558 // return a fully initialized value.
3559 //
3560 // For Arm NEON vector convert intrinsics, see
3561 // handleNEONVectorConvertIntrinsic().
3562 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3563 bool HasRoundingMode = false) {
3564 IRBuilder<> IRB(&I);
3565 Value *CopyOp, *ConvertOp;
3566
3567 assert((!HasRoundingMode ||
3568 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3569 "Invalid rounding mode");
3570
3571 switch (I.arg_size() - HasRoundingMode) {
3572 case 2:
3573 CopyOp = I.getArgOperand(0);
3574 ConvertOp = I.getArgOperand(1);
3575 break;
3576 case 1:
3577 ConvertOp = I.getArgOperand(0);
3578 CopyOp = nullptr;
3579 break;
3580 default:
3581 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3582 }
3583
3584 // The first *NumUsedElements* elements of ConvertOp are converted to the
3585 // same number of output elements. The rest of the output is copied from
3586 // CopyOp, or (if not available) filled with zeroes.
3587 // Combine shadow for elements of ConvertOp that are used in this operation,
3588 // and insert a check.
3589 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3590 // int->any conversion.
3591 Value *ConvertShadow = getShadow(ConvertOp);
3592 Value *AggShadow = nullptr;
3593 if (ConvertOp->getType()->isVectorTy()) {
3594 AggShadow = IRB.CreateExtractElement(
3595 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3596 for (int i = 1; i < NumUsedElements; ++i) {
3597 Value *MoreShadow = IRB.CreateExtractElement(
3598 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3599 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3600 }
3601 } else {
3602 AggShadow = ConvertShadow;
3603 }
3604 assert(AggShadow->getType()->isIntegerTy());
3605 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3606
3607 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3608 // ConvertOp.
3609 if (CopyOp) {
3610 assert(CopyOp->getType() == I.getType());
3611 assert(CopyOp->getType()->isVectorTy());
3612 Value *ResultShadow = getShadow(CopyOp);
3613 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3614 for (int i = 0; i < NumUsedElements; ++i) {
3615 ResultShadow = IRB.CreateInsertElement(
3616 ResultShadow, ConstantInt::getNullValue(EltTy),
3617 ConstantInt::get(IRB.getInt32Ty(), i));
3618 }
3619 setShadow(&I, ResultShadow);
3620 setOrigin(&I, getOrigin(CopyOp));
3621 } else {
3622 setShadow(&I, getCleanShadow(&I));
3623 setOrigin(&I, getCleanOrigin());
3624 }
3625 }
3626
3627 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3628 // zeroes if it is zero, and all ones otherwise.
3629 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3630 if (S->getType()->isVectorTy())
3631 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3632 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3633 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3634 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3635 }
3636
3637 // Given a vector, extract its first element, and return all
3638 // zeroes if it is zero, and all ones otherwise.
3639 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3640 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3641 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3642 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3643 }
3644
3645 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3646 Type *T = S->getType();
3647 assert(T->isVectorTy());
3648 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3649 return IRB.CreateSExt(S2, T);
3650 }
3651
3652 // Instrument vector shift intrinsic.
3653 //
3654 // This function instruments intrinsics like int_x86_avx2_psll_w.
3655 // Intrinsic shifts %In by %ShiftSize bits.
3656 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3657 // size, and the rest is ignored. Behavior is defined even if shift size is
3658 // greater than register (or field) width.
3659 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3660 assert(I.arg_size() == 2);
3661 IRBuilder<> IRB(&I);
3662 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3663 // Otherwise perform the same shift on S1.
3664 Value *S1 = getShadow(&I, 0);
3665 Value *S2 = getShadow(&I, 1);
3666 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3667 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3668 Value *V1 = I.getOperand(0);
3669 Value *V2 = I.getOperand(1);
3670 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3671 {IRB.CreateBitCast(S1, V1->getType()), V2});
3672 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3673 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3674 setOriginForNaryOp(I);
3675 }
3676
3677 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3678 // vectors.
3679 Type *getMMXVectorTy(unsigned EltSizeInBits,
3680 unsigned X86_MMXSizeInBits = 64) {
3681 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3682 "Illegal MMX vector element size");
3683 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3684 X86_MMXSizeInBits / EltSizeInBits);
3685 }
3686
3687 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3688 // intrinsic.
3689 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3690 switch (id) {
3691 case Intrinsic::x86_sse2_packsswb_128:
3692 case Intrinsic::x86_sse2_packuswb_128:
3693 return Intrinsic::x86_sse2_packsswb_128;
3694
3695 case Intrinsic::x86_sse2_packssdw_128:
3696 case Intrinsic::x86_sse41_packusdw:
3697 return Intrinsic::x86_sse2_packssdw_128;
3698
3699 case Intrinsic::x86_avx2_packsswb:
3700 case Intrinsic::x86_avx2_packuswb:
3701 return Intrinsic::x86_avx2_packsswb;
3702
3703 case Intrinsic::x86_avx2_packssdw:
3704 case Intrinsic::x86_avx2_packusdw:
3705 return Intrinsic::x86_avx2_packssdw;
3706
3707 case Intrinsic::x86_mmx_packsswb:
3708 case Intrinsic::x86_mmx_packuswb:
3709 return Intrinsic::x86_mmx_packsswb;
3710
3711 case Intrinsic::x86_mmx_packssdw:
3712 return Intrinsic::x86_mmx_packssdw;
3713
3714 case Intrinsic::x86_avx512_packssdw_512:
3715 case Intrinsic::x86_avx512_packusdw_512:
3716 return Intrinsic::x86_avx512_packssdw_512;
3717
3718 case Intrinsic::x86_avx512_packsswb_512:
3719 case Intrinsic::x86_avx512_packuswb_512:
3720 return Intrinsic::x86_avx512_packsswb_512;
3721
3722 default:
3723 llvm_unreachable("unexpected intrinsic id");
3724 }
3725 }
3726
3727 // Instrument vector pack intrinsic.
3728 //
3729 // This function instruments intrinsics like x86_mmx_packsswb, that
3730 // packs elements of 2 input vectors into half as many bits with saturation.
3731 // Shadow is propagated with the signed variant of the same intrinsic applied
3732 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3733 // MMXEltSizeInBits is used only for x86mmx arguments.
3734 //
3735 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3736 void handleVectorPackIntrinsic(IntrinsicInst &I,
3737 unsigned MMXEltSizeInBits = 0) {
3738 assert(I.arg_size() == 2);
3739 IRBuilder<> IRB(&I);
3740 Value *S1 = getShadow(&I, 0);
3741 Value *S2 = getShadow(&I, 1);
3742 assert(S1->getType()->isVectorTy());
3743
3744 // SExt and ICmpNE below must apply to individual elements of input vectors.
3745 // In case of x86mmx arguments, cast them to appropriate vector types and
3746 // back.
3747 Type *T =
3748 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3749 if (MMXEltSizeInBits) {
3750 S1 = IRB.CreateBitCast(S1, T);
3751 S2 = IRB.CreateBitCast(S2, T);
3752 }
3753 Value *S1_ext =
3755 Value *S2_ext =
3757 if (MMXEltSizeInBits) {
3758 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3759 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3760 }
3761
3762 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3763 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3764 "_msprop_vector_pack");
3765 if (MMXEltSizeInBits)
3766 S = IRB.CreateBitCast(S, getShadowTy(&I));
3767 setShadow(&I, S);
3768 setOriginForNaryOp(I);
3769 }
3770
3771 // Convert `Mask` into `<n x i1>`.
3772 Constant *createDppMask(unsigned Width, unsigned Mask) {
3774 for (auto &M : R) {
3775 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3776 Mask >>= 1;
3777 }
3778 return ConstantVector::get(R);
3779 }
3780
3781 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3782 // arg is poisoned, entire dot product is poisoned.
3783 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3784 unsigned DstMask) {
3785 const unsigned Width =
3786 cast<FixedVectorType>(S->getType())->getNumElements();
3787
3788 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3790 Value *SElem = IRB.CreateOrReduce(S);
3791 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3792 Value *DstMaskV = createDppMask(Width, DstMask);
3793
3794 return IRB.CreateSelect(
3795 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3796 }
3797
3798 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3799 //
3800 // 2 and 4 element versions produce single scalar of dot product, and then
3801 // puts it into elements of output vector, selected by 4 lowest bits of the
3802 // mask. Top 4 bits of the mask control which elements of input to use for dot
3803 // product.
3804 //
3805 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3806 // mask. According to the spec it just operates as 4 element version on first
3807 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3808 // output.
3809 void handleDppIntrinsic(IntrinsicInst &I) {
3810 IRBuilder<> IRB(&I);
3811
3812 Value *S0 = getShadow(&I, 0);
3813 Value *S1 = getShadow(&I, 1);
3814 Value *S = IRB.CreateOr(S0, S1);
3815
3816 const unsigned Width =
3817 cast<FixedVectorType>(S->getType())->getNumElements();
3818 assert(Width == 2 || Width == 4 || Width == 8);
3819
3820 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3821 const unsigned SrcMask = Mask >> 4;
3822 const unsigned DstMask = Mask & 0xf;
3823
3824 // Calculate shadow as `<n x i1>`.
3825 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3826 if (Width == 8) {
3827 // First 4 elements of shadow are already calculated. `makeDppShadow`
3828 // operats on 32 bit masks, so we can just shift masks, and repeat.
3829 SI1 = IRB.CreateOr(
3830 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3831 }
3832 // Extend to real size of shadow, poisoning either all or none bits of an
3833 // element.
3834 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3835
3836 setShadow(&I, S);
3837 setOriginForNaryOp(I);
3838 }
3839
3840 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3841 C = CreateAppToShadowCast(IRB, C);
3842 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3843 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3844 C = IRB.CreateAShr(C, ElSize - 1);
3845 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3846 return IRB.CreateTrunc(C, FVT);
3847 }
3848
3849 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3850 void handleBlendvIntrinsic(IntrinsicInst &I) {
3851 Value *C = I.getOperand(2);
3852 Value *T = I.getOperand(1);
3853 Value *F = I.getOperand(0);
3854
3855 Value *Sc = getShadow(&I, 2);
3856 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3857
3858 {
3859 IRBuilder<> IRB(&I);
3860 // Extract top bit from condition and its shadow.
3861 C = convertBlendvToSelectMask(IRB, C);
3862 Sc = convertBlendvToSelectMask(IRB, Sc);
3863
3864 setShadow(C, Sc);
3865 setOrigin(C, Oc);
3866 }
3867
3868 handleSelectLikeInst(I, C, T, F);
3869 }
3870
3871 // Instrument sum-of-absolute-differences intrinsic.
3872 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3873 const unsigned SignificantBitsPerResultElement = 16;
3874 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3875 unsigned ZeroBitsPerResultElement =
3876 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3877
3878 IRBuilder<> IRB(&I);
3879 auto *Shadow0 = getShadow(&I, 0);
3880 auto *Shadow1 = getShadow(&I, 1);
3881 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3882 S = IRB.CreateBitCast(S, ResTy);
3883 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3884 ResTy);
3885 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3886 S = IRB.CreateBitCast(S, getShadowTy(&I));
3887 setShadow(&I, S);
3888 setOriginForNaryOp(I);
3889 }
3890
3891 // Instrument multiply-add(-accumulate)? intrinsics.
3892 //
3893 // e.g., Two operands:
3894 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3895 //
3896 // Two operands which require an EltSizeInBits override:
3897 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3898 //
3899 // Three operands:
3900 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3901 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3902 // (this is equivalent to multiply-add on %a and %b, followed by
3903 // adding/"accumulating" %s. "Accumulation" stores the result in one
3904 // of the source registers, but this accumulate vs. add distinction
3905 // is lost when dealing with LLVM intrinsics.)
3906 void handleVectorPmaddIntrinsic(IntrinsicInst &I, unsigned ReductionFactor,
3907 unsigned EltSizeInBits = 0) {
3908 IRBuilder<> IRB(&I);
3909
3910 [[maybe_unused]] FixedVectorType *ReturnType =
3911 cast<FixedVectorType>(I.getType());
3912 assert(isa<FixedVectorType>(ReturnType));
3913
3914 // Vectors A and B, and shadows
3915 Value *Va = nullptr;
3916 Value *Vb = nullptr;
3917 Value *Sa = nullptr;
3918 Value *Sb = nullptr;
3919
3920 assert(I.arg_size() == 2 || I.arg_size() == 3);
3921 if (I.arg_size() == 2) {
3922 Va = I.getOperand(0);
3923 Vb = I.getOperand(1);
3924
3925 Sa = getShadow(&I, 0);
3926 Sb = getShadow(&I, 1);
3927 } else if (I.arg_size() == 3) {
3928 // Operand 0 is the accumulator. We will deal with that below.
3929 Va = I.getOperand(1);
3930 Vb = I.getOperand(2);
3931
3932 Sa = getShadow(&I, 1);
3933 Sb = getShadow(&I, 2);
3934 }
3935
3936 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3937 assert(ParamType == Vb->getType());
3938
3939 assert(ParamType->getPrimitiveSizeInBits() ==
3940 ReturnType->getPrimitiveSizeInBits());
3941
3942 if (I.arg_size() == 3) {
3943 [[maybe_unused]] auto *AccumulatorType =
3944 cast<FixedVectorType>(I.getOperand(0)->getType());
3945 assert(AccumulatorType == ReturnType);
3946 }
3947
3948 FixedVectorType *ImplicitReturnType = ReturnType;
3949 // Step 1: instrument multiplication of corresponding vector elements
3950 if (EltSizeInBits) {
3951 ImplicitReturnType = cast<FixedVectorType>(
3952 getMMXVectorTy(EltSizeInBits * ReductionFactor,
3953 ParamType->getPrimitiveSizeInBits()));
3954 ParamType = cast<FixedVectorType>(
3955 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
3956
3957 Va = IRB.CreateBitCast(Va, ParamType);
3958 Vb = IRB.CreateBitCast(Vb, ParamType);
3959
3960 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
3961 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
3962 } else {
3963 assert(ParamType->getNumElements() ==
3964 ReturnType->getNumElements() * ReductionFactor);
3965 }
3966
3967 // Multiplying an *initialized* zero by an uninitialized element results in
3968 // an initialized zero element.
3969 //
3970 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
3971 // results in an unpoisoned value. We can therefore adapt the visitAnd()
3972 // instrumentation:
3973 // OutShadow = (SaNonZero & SbNonZero)
3974 // | (VaNonZero & SbNonZero)
3975 // | (SaNonZero & VbNonZero)
3976 // where non-zero is checked on a per-element basis (not per bit).
3977 Value *SZero = Constant::getNullValue(Va->getType());
3978 Value *VZero = Constant::getNullValue(Sa->getType());
3979 Value *SaNonZero = IRB.CreateICmpNE(Sa, SZero);
3980 Value *SbNonZero = IRB.CreateICmpNE(Sb, SZero);
3981 Value *VaNonZero = IRB.CreateICmpNE(Va, VZero);
3982 Value *VbNonZero = IRB.CreateICmpNE(Vb, VZero);
3983
3984 Value *SaAndSbNonZero = IRB.CreateAnd(SaNonZero, SbNonZero);
3985 Value *VaAndSbNonZero = IRB.CreateAnd(VaNonZero, SbNonZero);
3986 Value *SaAndVbNonZero = IRB.CreateAnd(SaNonZero, VbNonZero);
3987
3988 // Each element of the vector is represented by a single bit (poisoned or
3989 // not) e.g., <8 x i1>.
3990 Value *And = IRB.CreateOr({SaAndSbNonZero, VaAndSbNonZero, SaAndVbNonZero});
3991
3992 // Extend <8 x i1> to <8 x i16>.
3993 // (The real pmadd intrinsic would have computed intermediate values of
3994 // <8 x i32>, but that is irrelevant for our shadow purposes because we
3995 // consider each element to be either fully initialized or fully
3996 // uninitialized.)
3997 And = IRB.CreateSExt(And, Sa->getType());
3998
3999 // Step 2: instrument horizontal add
4000 // We don't need bit-precise horizontalReduce because we only want to check
4001 // if each pair/quad of elements is fully zero.
4002 // Cast to <4 x i32>.
4003 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4004
4005 // Compute <4 x i1>, then extend back to <4 x i32>.
4006 Value *OutShadow = IRB.CreateSExt(
4007 IRB.CreateICmpNE(Horizontal,
4008 Constant::getNullValue(Horizontal->getType())),
4009 ImplicitReturnType);
4010
4011 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4012 // AVX, it is already correct).
4013 if (EltSizeInBits)
4014 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4015
4016 // Step 3 (if applicable): instrument accumulator
4017 if (I.arg_size() == 3)
4018 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4019
4020 setShadow(&I, OutShadow);
4021 setOriginForNaryOp(I);
4022 }
4023
4024 // Instrument compare-packed intrinsic.
4025 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4026 // all-ones shadow.
4027 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
4028 IRBuilder<> IRB(&I);
4029 Type *ResTy = getShadowTy(&I);
4030 auto *Shadow0 = getShadow(&I, 0);
4031 auto *Shadow1 = getShadow(&I, 1);
4032 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4033 Value *S = IRB.CreateSExt(
4034 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4035 setShadow(&I, S);
4036 setOriginForNaryOp(I);
4037 }
4038
4039 // Instrument compare-scalar intrinsic.
4040 // This handles both cmp* intrinsics which return the result in the first
4041 // element of a vector, and comi* which return the result as i32.
4042 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4043 IRBuilder<> IRB(&I);
4044 auto *Shadow0 = getShadow(&I, 0);
4045 auto *Shadow1 = getShadow(&I, 1);
4046 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4047 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4048 setShadow(&I, S);
4049 setOriginForNaryOp(I);
4050 }
4051
4052 // Instrument generic vector reduction intrinsics
4053 // by ORing together all their fields.
4054 //
4055 // If AllowShadowCast is true, the return type does not need to be the same
4056 // type as the fields
4057 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4058 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4059 assert(I.arg_size() == 1);
4060
4061 IRBuilder<> IRB(&I);
4062 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4063 if (AllowShadowCast)
4064 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4065 else
4066 assert(S->getType() == getShadowTy(&I));
4067 setShadow(&I, S);
4068 setOriginForNaryOp(I);
4069 }
4070
4071 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4072 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4073 // %a1)
4074 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4075 //
4076 // The type of the return value, initial starting value, and elements of the
4077 // vector must be identical.
4078 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4079 assert(I.arg_size() == 2);
4080
4081 IRBuilder<> IRB(&I);
4082 Value *Shadow0 = getShadow(&I, 0);
4083 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4084 assert(Shadow0->getType() == Shadow1->getType());
4085 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4086 assert(S->getType() == getShadowTy(&I));
4087 setShadow(&I, S);
4088 setOriginForNaryOp(I);
4089 }
4090
4091 // Instrument vector.reduce.or intrinsic.
4092 // Valid (non-poisoned) set bits in the operand pull low the
4093 // corresponding shadow bits.
4094 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4095 assert(I.arg_size() == 1);
4096
4097 IRBuilder<> IRB(&I);
4098 Value *OperandShadow = getShadow(&I, 0);
4099 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4100 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4101 // Bit N is clean if any field's bit N is 1 and unpoison
4102 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4103 // Otherwise, it is clean if every field's bit N is unpoison
4104 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4105 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4106
4107 setShadow(&I, S);
4108 setOrigin(&I, getOrigin(&I, 0));
4109 }
4110
4111 // Instrument vector.reduce.and intrinsic.
4112 // Valid (non-poisoned) unset bits in the operand pull down the
4113 // corresponding shadow bits.
4114 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4115 assert(I.arg_size() == 1);
4116
4117 IRBuilder<> IRB(&I);
4118 Value *OperandShadow = getShadow(&I, 0);
4119 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4120 // Bit N is clean if any field's bit N is 0 and unpoison
4121 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4122 // Otherwise, it is clean if every field's bit N is unpoison
4123 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4124 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4125
4126 setShadow(&I, S);
4127 setOrigin(&I, getOrigin(&I, 0));
4128 }
4129
4130 void handleStmxcsr(IntrinsicInst &I) {
4131 IRBuilder<> IRB(&I);
4132 Value *Addr = I.getArgOperand(0);
4133 Type *Ty = IRB.getInt32Ty();
4134 Value *ShadowPtr =
4135 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4136
4137 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4138
4140 insertCheckShadowOf(Addr, &I);
4141 }
4142
4143 void handleLdmxcsr(IntrinsicInst &I) {
4144 if (!InsertChecks)
4145 return;
4146
4147 IRBuilder<> IRB(&I);
4148 Value *Addr = I.getArgOperand(0);
4149 Type *Ty = IRB.getInt32Ty();
4150 const Align Alignment = Align(1);
4151 Value *ShadowPtr, *OriginPtr;
4152 std::tie(ShadowPtr, OriginPtr) =
4153 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4154
4156 insertCheckShadowOf(Addr, &I);
4157
4158 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4159 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4160 : getCleanOrigin();
4161 insertCheckShadow(Shadow, Origin, &I);
4162 }
4163
4164 void handleMaskedExpandLoad(IntrinsicInst &I) {
4165 IRBuilder<> IRB(&I);
4166 Value *Ptr = I.getArgOperand(0);
4167 MaybeAlign Align = I.getParamAlign(0);
4168 Value *Mask = I.getArgOperand(1);
4169 Value *PassThru = I.getArgOperand(2);
4170
4172 insertCheckShadowOf(Ptr, &I);
4173 insertCheckShadowOf(Mask, &I);
4174 }
4175
4176 if (!PropagateShadow) {
4177 setShadow(&I, getCleanShadow(&I));
4178 setOrigin(&I, getCleanOrigin());
4179 return;
4180 }
4181
4182 Type *ShadowTy = getShadowTy(&I);
4183 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4184 auto [ShadowPtr, OriginPtr] =
4185 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4186
4187 Value *Shadow =
4188 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4189 getShadow(PassThru), "_msmaskedexpload");
4190
4191 setShadow(&I, Shadow);
4192
4193 // TODO: Store origins.
4194 setOrigin(&I, getCleanOrigin());
4195 }
4196
4197 void handleMaskedCompressStore(IntrinsicInst &I) {
4198 IRBuilder<> IRB(&I);
4199 Value *Values = I.getArgOperand(0);
4200 Value *Ptr = I.getArgOperand(1);
4201 MaybeAlign Align = I.getParamAlign(1);
4202 Value *Mask = I.getArgOperand(2);
4203
4205 insertCheckShadowOf(Ptr, &I);
4206 insertCheckShadowOf(Mask, &I);
4207 }
4208
4209 Value *Shadow = getShadow(Values);
4210 Type *ElementShadowTy =
4211 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4212 auto [ShadowPtr, OriginPtrs] =
4213 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4214
4215 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4216
4217 // TODO: Store origins.
4218 }
4219
4220 void handleMaskedGather(IntrinsicInst &I) {
4221 IRBuilder<> IRB(&I);
4222 Value *Ptrs = I.getArgOperand(0);
4223 const Align Alignment = I.getParamAlign(0).valueOrOne();
4224 Value *Mask = I.getArgOperand(1);
4225 Value *PassThru = I.getArgOperand(2);
4226
4227 Type *PtrsShadowTy = getShadowTy(Ptrs);
4229 insertCheckShadowOf(Mask, &I);
4230 Value *MaskedPtrShadow = IRB.CreateSelect(
4231 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4232 "_msmaskedptrs");
4233 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4234 }
4235
4236 if (!PropagateShadow) {
4237 setShadow(&I, getCleanShadow(&I));
4238 setOrigin(&I, getCleanOrigin());
4239 return;
4240 }
4241
4242 Type *ShadowTy = getShadowTy(&I);
4243 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4244 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4245 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4246
4247 Value *Shadow =
4248 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4249 getShadow(PassThru), "_msmaskedgather");
4250
4251 setShadow(&I, Shadow);
4252
4253 // TODO: Store origins.
4254 setOrigin(&I, getCleanOrigin());
4255 }
4256
4257 void handleMaskedScatter(IntrinsicInst &I) {
4258 IRBuilder<> IRB(&I);
4259 Value *Values = I.getArgOperand(0);
4260 Value *Ptrs = I.getArgOperand(1);
4261 const Align Alignment = I.getParamAlign(1).valueOrOne();
4262 Value *Mask = I.getArgOperand(2);
4263
4264 Type *PtrsShadowTy = getShadowTy(Ptrs);
4266 insertCheckShadowOf(Mask, &I);
4267 Value *MaskedPtrShadow = IRB.CreateSelect(
4268 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4269 "_msmaskedptrs");
4270 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4271 }
4272
4273 Value *Shadow = getShadow(Values);
4274 Type *ElementShadowTy =
4275 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4276 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4277 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4278
4279 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4280
4281 // TODO: Store origin.
4282 }
4283
4284 // Intrinsic::masked_store
4285 //
4286 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4287 // stores are lowered to Intrinsic::masked_store.
4288 void handleMaskedStore(IntrinsicInst &I) {
4289 IRBuilder<> IRB(&I);
4290 Value *V = I.getArgOperand(0);
4291 Value *Ptr = I.getArgOperand(1);
4292 const Align Alignment = I.getParamAlign(1).valueOrOne();
4293 Value *Mask = I.getArgOperand(2);
4294 Value *Shadow = getShadow(V);
4295
4297 insertCheckShadowOf(Ptr, &I);
4298 insertCheckShadowOf(Mask, &I);
4299 }
4300
4301 Value *ShadowPtr;
4302 Value *OriginPtr;
4303 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4304 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4305
4306 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4307
4308 if (!MS.TrackOrigins)
4309 return;
4310
4311 auto &DL = F.getDataLayout();
4312 paintOrigin(IRB, getOrigin(V), OriginPtr,
4313 DL.getTypeStoreSize(Shadow->getType()),
4314 std::max(Alignment, kMinOriginAlignment));
4315 }
4316
4317 // Intrinsic::masked_load
4318 //
4319 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4320 // loads are lowered to Intrinsic::masked_load.
4321 void handleMaskedLoad(IntrinsicInst &I) {
4322 IRBuilder<> IRB(&I);
4323 Value *Ptr = I.getArgOperand(0);
4324 const Align Alignment = I.getParamAlign(0).valueOrOne();
4325 Value *Mask = I.getArgOperand(1);
4326 Value *PassThru = I.getArgOperand(2);
4327
4329 insertCheckShadowOf(Ptr, &I);
4330 insertCheckShadowOf(Mask, &I);
4331 }
4332
4333 if (!PropagateShadow) {
4334 setShadow(&I, getCleanShadow(&I));
4335 setOrigin(&I, getCleanOrigin());
4336 return;
4337 }
4338
4339 Type *ShadowTy = getShadowTy(&I);
4340 Value *ShadowPtr, *OriginPtr;
4341 std::tie(ShadowPtr, OriginPtr) =
4342 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4343 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4344 getShadow(PassThru), "_msmaskedld"));
4345
4346 if (!MS.TrackOrigins)
4347 return;
4348
4349 // Choose between PassThru's and the loaded value's origins.
4350 Value *MaskedPassThruShadow = IRB.CreateAnd(
4351 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4352
4353 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4354
4355 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4356 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4357
4358 setOrigin(&I, Origin);
4359 }
4360
4361 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4362 // dst mask src
4363 //
4364 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4365 // by handleMaskedStore.
4366 //
4367 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4368 // vector of integers, unlike the LLVM masked intrinsics, which require a
4369 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4370 // mentions that the x86 backend does not know how to efficiently convert
4371 // from a vector of booleans back into the AVX mask format; therefore, they
4372 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4373 // intrinsics.
4374 void handleAVXMaskedStore(IntrinsicInst &I) {
4375 assert(I.arg_size() == 3);
4376
4377 IRBuilder<> IRB(&I);
4378
4379 Value *Dst = I.getArgOperand(0);
4380 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4381
4382 Value *Mask = I.getArgOperand(1);
4383 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4384
4385 Value *Src = I.getArgOperand(2);
4386 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4387
4388 const Align Alignment = Align(1);
4389
4390 Value *SrcShadow = getShadow(Src);
4391
4393 insertCheckShadowOf(Dst, &I);
4394 insertCheckShadowOf(Mask, &I);
4395 }
4396
4397 Value *DstShadowPtr;
4398 Value *DstOriginPtr;
4399 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4400 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4401
4402 SmallVector<Value *, 2> ShadowArgs;
4403 ShadowArgs.append(1, DstShadowPtr);
4404 ShadowArgs.append(1, Mask);
4405 // The intrinsic may require floating-point but shadows can be arbitrary
4406 // bit patterns, of which some would be interpreted as "invalid"
4407 // floating-point values (NaN etc.); we assume the intrinsic will happily
4408 // copy them.
4409 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4410
4411 CallInst *CI =
4412 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4413 setShadow(&I, CI);
4414
4415 if (!MS.TrackOrigins)
4416 return;
4417
4418 // Approximation only
4419 auto &DL = F.getDataLayout();
4420 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4421 DL.getTypeStoreSize(SrcShadow->getType()),
4422 std::max(Alignment, kMinOriginAlignment));
4423 }
4424
4425 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4426 // return src mask
4427 //
4428 // Masked-off values are replaced with 0, which conveniently also represents
4429 // initialized memory.
4430 //
4431 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4432 // by handleMaskedStore.
4433 //
4434 // We do not combine this with handleMaskedLoad; see comment in
4435 // handleAVXMaskedStore for the rationale.
4436 //
4437 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4438 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4439 // parameter.
4440 void handleAVXMaskedLoad(IntrinsicInst &I) {
4441 assert(I.arg_size() == 2);
4442
4443 IRBuilder<> IRB(&I);
4444
4445 Value *Src = I.getArgOperand(0);
4446 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4447
4448 Value *Mask = I.getArgOperand(1);
4449 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4450
4451 const Align Alignment = Align(1);
4452
4454 insertCheckShadowOf(Mask, &I);
4455 }
4456
4457 Type *SrcShadowTy = getShadowTy(Src);
4458 Value *SrcShadowPtr, *SrcOriginPtr;
4459 std::tie(SrcShadowPtr, SrcOriginPtr) =
4460 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4461
4462 SmallVector<Value *, 2> ShadowArgs;
4463 ShadowArgs.append(1, SrcShadowPtr);
4464 ShadowArgs.append(1, Mask);
4465
4466 CallInst *CI =
4467 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4468 // The AVX masked load intrinsics do not have integer variants. We use the
4469 // floating-point variants, which will happily copy the shadows even if
4470 // they are interpreted as "invalid" floating-point values (NaN etc.).
4471 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4472
4473 if (!MS.TrackOrigins)
4474 return;
4475
4476 // The "pass-through" value is always zero (initialized). To the extent
4477 // that that results in initialized aligned 4-byte chunks, the origin value
4478 // is ignored. It is therefore correct to simply copy the origin from src.
4479 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4480 setOrigin(&I, PtrSrcOrigin);
4481 }
4482
4483 // Test whether the mask indices are initialized, only checking the bits that
4484 // are actually used.
4485 //
4486 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4487 // used/checked.
4488 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4489 assert(isFixedIntVector(Idx));
4490 auto IdxVectorSize =
4491 cast<FixedVectorType>(Idx->getType())->getNumElements();
4492 assert(isPowerOf2_64(IdxVectorSize));
4493
4494 // Compiler isn't smart enough, let's help it
4495 if (isa<Constant>(Idx))
4496 return;
4497
4498 auto *IdxShadow = getShadow(Idx);
4499 Value *Truncated = IRB.CreateTrunc(
4500 IdxShadow,
4501 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4502 IdxVectorSize));
4503 insertCheckShadow(Truncated, getOrigin(Idx), I);
4504 }
4505
4506 // Instrument AVX permutation intrinsic.
4507 // We apply the same permutation (argument index 1) to the shadow.
4508 void handleAVXVpermilvar(IntrinsicInst &I) {
4509 IRBuilder<> IRB(&I);
4510 Value *Shadow = getShadow(&I, 0);
4511 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4512
4513 // Shadows are integer-ish types but some intrinsics require a
4514 // different (e.g., floating-point) type.
4515 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4516 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4517 {Shadow, I.getArgOperand(1)});
4518
4519 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4520 setOriginForNaryOp(I);
4521 }
4522
4523 // Instrument AVX permutation intrinsic.
4524 // We apply the same permutation (argument index 1) to the shadows.
4525 void handleAVXVpermi2var(IntrinsicInst &I) {
4526 assert(I.arg_size() == 3);
4527 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4528 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4529 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4530 [[maybe_unused]] auto ArgVectorSize =
4531 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4532 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4533 ->getNumElements() == ArgVectorSize);
4534 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4535 ->getNumElements() == ArgVectorSize);
4536 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4537 assert(I.getType() == I.getArgOperand(0)->getType());
4538 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4539 IRBuilder<> IRB(&I);
4540 Value *AShadow = getShadow(&I, 0);
4541 Value *Idx = I.getArgOperand(1);
4542 Value *BShadow = getShadow(&I, 2);
4543
4544 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4545
4546 // Shadows are integer-ish types but some intrinsics require a
4547 // different (e.g., floating-point) type.
4548 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4549 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4550 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4551 {AShadow, Idx, BShadow});
4552 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4553 setOriginForNaryOp(I);
4554 }
4555
4556 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4557 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4558 }
4559
4560 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4561 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4562 }
4563
4564 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4565 return isFixedIntVectorTy(V->getType());
4566 }
4567
4568 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4569 return isFixedFPVectorTy(V->getType());
4570 }
4571
4572 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4573 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4574 // i32 rounding)
4575 //
4576 // Inconveniently, some similar intrinsics have a different operand order:
4577 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4578 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4579 // i16 mask)
4580 //
4581 // If the return type has more elements than A, the excess elements are
4582 // zeroed (and the corresponding shadow is initialized).
4583 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4584 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4585 // i8 mask)
4586 //
4587 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4588 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4589 // where all_or_nothing(x) is fully uninitialized if x has any
4590 // uninitialized bits
4591 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4592 IRBuilder<> IRB(&I);
4593
4594 assert(I.arg_size() == 4);
4595 Value *A = I.getOperand(0);
4596 Value *WriteThrough;
4597 Value *Mask;
4599 if (LastMask) {
4600 WriteThrough = I.getOperand(2);
4601 Mask = I.getOperand(3);
4602 RoundingMode = I.getOperand(1);
4603 } else {
4604 WriteThrough = I.getOperand(1);
4605 Mask = I.getOperand(2);
4606 RoundingMode = I.getOperand(3);
4607 }
4608
4609 assert(isFixedFPVector(A));
4610 assert(isFixedIntVector(WriteThrough));
4611
4612 unsigned ANumElements =
4613 cast<FixedVectorType>(A->getType())->getNumElements();
4614 [[maybe_unused]] unsigned WriteThruNumElements =
4615 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4616 assert(ANumElements == WriteThruNumElements ||
4617 ANumElements * 2 == WriteThruNumElements);
4618
4619 assert(Mask->getType()->isIntegerTy());
4620 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4621 assert(ANumElements == MaskNumElements ||
4622 ANumElements * 2 == MaskNumElements);
4623
4624 assert(WriteThruNumElements == MaskNumElements);
4625
4626 // Some bits of the mask may be unused, though it's unusual to have partly
4627 // uninitialized bits.
4628 insertCheckShadowOf(Mask, &I);
4629
4630 assert(RoundingMode->getType()->isIntegerTy());
4631 // Only some bits of the rounding mode are used, though it's very
4632 // unusual to have uninitialized bits there (more commonly, it's a
4633 // constant).
4634 insertCheckShadowOf(RoundingMode, &I);
4635
4636 assert(I.getType() == WriteThrough->getType());
4637
4638 Value *AShadow = getShadow(A);
4639 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4640
4641 if (ANumElements * 2 == MaskNumElements) {
4642 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4643 // from the zeroed shadow instead of the writethrough's shadow.
4644 Mask =
4645 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4646 Mask =
4647 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4648 }
4649
4650 // Convert i16 mask to <16 x i1>
4651 Mask = IRB.CreateBitCast(
4652 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4653 "_ms_mask_bitcast");
4654
4655 /// For floating-point to integer conversion, the output is:
4656 /// - fully uninitialized if *any* bit of the input is uninitialized
4657 /// - fully ininitialized if all bits of the input are ininitialized
4658 /// We apply the same principle on a per-element basis for vectors.
4659 ///
4660 /// We use the scalar width of the return type instead of A's.
4661 AShadow = IRB.CreateSExt(
4662 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4663 getShadowTy(&I), "_ms_a_shadow");
4664
4665 Value *WriteThroughShadow = getShadow(WriteThrough);
4666 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4667 "_ms_writethru_select");
4668
4669 setShadow(&I, Shadow);
4670 setOriginForNaryOp(I);
4671 }
4672
4673 // Instrument BMI / BMI2 intrinsics.
4674 // All of these intrinsics are Z = I(X, Y)
4675 // where the types of all operands and the result match, and are either i32 or
4676 // i64. The following instrumentation happens to work for all of them:
4677 // Sz = I(Sx, Y) | (sext (Sy != 0))
4678 void handleBmiIntrinsic(IntrinsicInst &I) {
4679 IRBuilder<> IRB(&I);
4680 Type *ShadowTy = getShadowTy(&I);
4681
4682 // If any bit of the mask operand is poisoned, then the whole thing is.
4683 Value *SMask = getShadow(&I, 1);
4684 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4685 ShadowTy);
4686 // Apply the same intrinsic to the shadow of the first operand.
4687 Value *S = IRB.CreateCall(I.getCalledFunction(),
4688 {getShadow(&I, 0), I.getOperand(1)});
4689 S = IRB.CreateOr(SMask, S);
4690 setShadow(&I, S);
4691 setOriginForNaryOp(I);
4692 }
4693
4694 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4695 SmallVector<int, 8> Mask;
4696 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4697 Mask.append(2, X);
4698 }
4699 return Mask;
4700 }
4701
4702 // Instrument pclmul intrinsics.
4703 // These intrinsics operate either on odd or on even elements of the input
4704 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4705 // Replace the unused elements with copies of the used ones, ex:
4706 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4707 // or
4708 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4709 // and then apply the usual shadow combining logic.
4710 void handlePclmulIntrinsic(IntrinsicInst &I) {
4711 IRBuilder<> IRB(&I);
4712 unsigned Width =
4713 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4714 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4715 "pclmul 3rd operand must be a constant");
4716 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4717 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4718 getPclmulMask(Width, Imm & 0x01));
4719 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4720 getPclmulMask(Width, Imm & 0x10));
4721 ShadowAndOriginCombiner SOC(this, IRB);
4722 SOC.Add(Shuf0, getOrigin(&I, 0));
4723 SOC.Add(Shuf1, getOrigin(&I, 1));
4724 SOC.Done(&I);
4725 }
4726
4727 // Instrument _mm_*_sd|ss intrinsics
4728 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4729 IRBuilder<> IRB(&I);
4730 unsigned Width =
4731 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4732 Value *First = getShadow(&I, 0);
4733 Value *Second = getShadow(&I, 1);
4734 // First element of second operand, remaining elements of first operand
4735 SmallVector<int, 16> Mask;
4736 Mask.push_back(Width);
4737 for (unsigned i = 1; i < Width; i++)
4738 Mask.push_back(i);
4739 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4740
4741 setShadow(&I, Shadow);
4742 setOriginForNaryOp(I);
4743 }
4744
4745 void handleVtestIntrinsic(IntrinsicInst &I) {
4746 IRBuilder<> IRB(&I);
4747 Value *Shadow0 = getShadow(&I, 0);
4748 Value *Shadow1 = getShadow(&I, 1);
4749 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4750 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4751 Value *Scalar = convertShadowToScalar(NZ, IRB);
4752 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4753
4754 setShadow(&I, Shadow);
4755 setOriginForNaryOp(I);
4756 }
4757
4758 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4759 IRBuilder<> IRB(&I);
4760 unsigned Width =
4761 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4762 Value *First = getShadow(&I, 0);
4763 Value *Second = getShadow(&I, 1);
4764 Value *OrShadow = IRB.CreateOr(First, Second);
4765 // First element of both OR'd together, remaining elements of first operand
4766 SmallVector<int, 16> Mask;
4767 Mask.push_back(Width);
4768 for (unsigned i = 1; i < Width; i++)
4769 Mask.push_back(i);
4770 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4771
4772 setShadow(&I, Shadow);
4773 setOriginForNaryOp(I);
4774 }
4775
4776 // _mm_round_ps / _mm_round_ps.
4777 // Similar to maybeHandleSimpleNomemIntrinsic except
4778 // the second argument is guaranteed to be a constant integer.
4779 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4780 assert(I.getArgOperand(0)->getType() == I.getType());
4781 assert(I.arg_size() == 2);
4782 assert(isa<ConstantInt>(I.getArgOperand(1)));
4783
4784 IRBuilder<> IRB(&I);
4785 ShadowAndOriginCombiner SC(this, IRB);
4786 SC.Add(I.getArgOperand(0));
4787 SC.Done(&I);
4788 }
4789
4790 // Instrument @llvm.abs intrinsic.
4791 //
4792 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4793 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4794 void handleAbsIntrinsic(IntrinsicInst &I) {
4795 assert(I.arg_size() == 2);
4796 Value *Src = I.getArgOperand(0);
4797 Value *IsIntMinPoison = I.getArgOperand(1);
4798
4799 assert(I.getType()->isIntOrIntVectorTy());
4800
4801 assert(Src->getType() == I.getType());
4802
4803 assert(IsIntMinPoison->getType()->isIntegerTy());
4804 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4805
4806 IRBuilder<> IRB(&I);
4807 Value *SrcShadow = getShadow(Src);
4808
4809 APInt MinVal =
4810 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4811 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4812 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4813
4814 Value *PoisonedShadow = getPoisonedShadow(Src);
4815 Value *PoisonedIfIntMinShadow =
4816 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4817 Value *Shadow =
4818 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4819
4820 setShadow(&I, Shadow);
4821 setOrigin(&I, getOrigin(&I, 0));
4822 }
4823
4824 void handleIsFpClass(IntrinsicInst &I) {
4825 IRBuilder<> IRB(&I);
4826 Value *Shadow = getShadow(&I, 0);
4827 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4828 setOrigin(&I, getOrigin(&I, 0));
4829 }
4830
4831 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4832 IRBuilder<> IRB(&I);
4833 Value *Shadow0 = getShadow(&I, 0);
4834 Value *Shadow1 = getShadow(&I, 1);
4835 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4836 Value *ShadowElt1 =
4837 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4838
4839 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4840 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4841 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4842
4843 setShadow(&I, Shadow);
4844 setOriginForNaryOp(I);
4845 }
4846
4847 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4848 assert(isa<FixedVectorType>(V->getType()));
4849 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4850 Value *Shadow = getShadow(V);
4851 return IRB.CreateExtractElement(Shadow,
4852 ConstantInt::get(IRB.getInt32Ty(), 0));
4853 }
4854
4855 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4856 //
4857 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4858 // (<8 x i64>, <16 x i8>, i8)
4859 // A WriteThru Mask
4860 //
4861 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4862 // (<16 x i32>, <16 x i8>, i16)
4863 //
4864 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4865 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4866 //
4867 // If Dst has more elements than A, the excess elements are zeroed (and the
4868 // corresponding shadow is initialized).
4869 //
4870 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4871 // and is much faster than this handler.
4872 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4873 IRBuilder<> IRB(&I);
4874
4875 assert(I.arg_size() == 3);
4876 Value *A = I.getOperand(0);
4877 Value *WriteThrough = I.getOperand(1);
4878 Value *Mask = I.getOperand(2);
4879
4880 assert(isFixedIntVector(A));
4881 assert(isFixedIntVector(WriteThrough));
4882
4883 unsigned ANumElements =
4884 cast<FixedVectorType>(A->getType())->getNumElements();
4885 unsigned OutputNumElements =
4886 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4887 assert(ANumElements == OutputNumElements ||
4888 ANumElements * 2 == OutputNumElements);
4889
4890 assert(Mask->getType()->isIntegerTy());
4891 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4892 insertCheckShadowOf(Mask, &I);
4893
4894 assert(I.getType() == WriteThrough->getType());
4895
4896 // Widen the mask, if necessary, to have one bit per element of the output
4897 // vector.
4898 // We want the extra bits to have '1's, so that the CreateSelect will
4899 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4900 // versions of the intrinsics are sometimes implemented using an all-1's
4901 // mask and an undefined value for WriteThroughShadow). We accomplish this
4902 // by using bitwise NOT before and after the ZExt.
4903 if (ANumElements != OutputNumElements) {
4904 Mask = IRB.CreateNot(Mask);
4905 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4906 "_ms_widen_mask");
4907 Mask = IRB.CreateNot(Mask);
4908 }
4909 Mask = IRB.CreateBitCast(
4910 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4911
4912 Value *AShadow = getShadow(A);
4913
4914 // The return type might have more elements than the input.
4915 // Temporarily shrink the return type's number of elements.
4916 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4917
4918 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4919 // This handler treats them all as truncation, which leads to some rare
4920 // false positives in the cases where the truncated bytes could
4921 // unambiguously saturate the value e.g., if A = ??????10 ????????
4922 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4923 // fully defined, but the truncated byte is ????????.
4924 //
4925 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4926 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4927 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4928
4929 Value *WriteThroughShadow = getShadow(WriteThrough);
4930
4931 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4932 setShadow(&I, Shadow);
4933 setOriginForNaryOp(I);
4934 }
4935
4936 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4937 // values and perform an operation whose shadow propagation should be handled
4938 // as all-or-nothing [*], with masking provided by a vector and a mask
4939 // supplied as an integer.
4940 //
4941 // [*] if all bits of a vector element are initialized, the output is fully
4942 // initialized; otherwise, the output is fully uninitialized
4943 //
4944 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
4945 // (<16 x float>, <16 x float>, i16)
4946 // A WriteThru Mask
4947 //
4948 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
4949 // (<2 x double>, <2 x double>, i8)
4950 //
4951 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
4952 // (<8 x double>, i32, <8 x double>, i8, i32)
4953 // A Imm WriteThru Mask Rounding
4954 //
4955 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
4956 // be fully initialized.
4957 //
4958 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
4959 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
4960 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
4961 unsigned WriteThruIndex,
4962 unsigned MaskIndex) {
4963 IRBuilder<> IRB(&I);
4964
4965 unsigned NumArgs = I.arg_size();
4966 assert(AIndex < NumArgs);
4967 assert(WriteThruIndex < NumArgs);
4968 assert(MaskIndex < NumArgs);
4969 assert(AIndex != WriteThruIndex);
4970 assert(AIndex != MaskIndex);
4971 assert(WriteThruIndex != MaskIndex);
4972
4973 Value *A = I.getOperand(AIndex);
4974 Value *WriteThru = I.getOperand(WriteThruIndex);
4975 Value *Mask = I.getOperand(MaskIndex);
4976
4977 assert(isFixedFPVector(A));
4978 assert(isFixedFPVector(WriteThru));
4979
4980 [[maybe_unused]] unsigned ANumElements =
4981 cast<FixedVectorType>(A->getType())->getNumElements();
4982 unsigned OutputNumElements =
4983 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
4984 assert(ANumElements == OutputNumElements);
4985
4986 for (unsigned i = 0; i < NumArgs; ++i) {
4987 if (i != AIndex && i != WriteThruIndex) {
4988 // Imm, Mask, Rounding etc. are "control" data, hence we require that
4989 // they be fully initialized.
4990 assert(I.getOperand(i)->getType()->isIntegerTy());
4991 insertCheckShadowOf(I.getOperand(i), &I);
4992 }
4993 }
4994
4995 // The mask has 1 bit per element of A, but a minimum of 8 bits.
4996 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
4997 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
4998 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4999
5000 assert(I.getType() == WriteThru->getType());
5001
5002 Mask = IRB.CreateBitCast(
5003 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5004
5005 Value *AShadow = getShadow(A);
5006
5007 // All-or-nothing shadow
5008 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
5009 AShadow->getType());
5010
5011 Value *WriteThruShadow = getShadow(WriteThru);
5012
5013 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
5014 setShadow(&I, Shadow);
5015
5016 setOriginForNaryOp(I);
5017 }
5018
5019 // For sh.* compiler intrinsics:
5020 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5021 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5022 // A B WriteThru Mask RoundingMode
5023 //
5024 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5025 // DstShadow[1..7] = AShadow[1..7]
5026 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5027 IRBuilder<> IRB(&I);
5028
5029 assert(I.arg_size() == 5);
5030 Value *A = I.getOperand(0);
5031 Value *B = I.getOperand(1);
5032 Value *WriteThrough = I.getOperand(2);
5033 Value *Mask = I.getOperand(3);
5034 Value *RoundingMode = I.getOperand(4);
5035
5036 // Technically, we could probably just check whether the LSB is
5037 // initialized, but intuitively it feels like a partly uninitialized mask
5038 // is unintended, and we should warn the user immediately.
5039 insertCheckShadowOf(Mask, &I);
5040 insertCheckShadowOf(RoundingMode, &I);
5041
5042 assert(isa<FixedVectorType>(A->getType()));
5043 unsigned NumElements =
5044 cast<FixedVectorType>(A->getType())->getNumElements();
5045 assert(NumElements == 8);
5046 assert(A->getType() == B->getType());
5047 assert(B->getType() == WriteThrough->getType());
5048 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5049 assert(RoundingMode->getType()->isIntegerTy());
5050
5051 Value *ALowerShadow = extractLowerShadow(IRB, A);
5052 Value *BLowerShadow = extractLowerShadow(IRB, B);
5053
5054 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5055
5056 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5057
5058 Mask = IRB.CreateBitCast(
5059 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5060 Value *MaskLower =
5061 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5062
5063 Value *AShadow = getShadow(A);
5064 Value *DstLowerShadow =
5065 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5066 Value *DstShadow = IRB.CreateInsertElement(
5067 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5068 "_msprop");
5069
5070 setShadow(&I, DstShadow);
5071 setOriginForNaryOp(I);
5072 }
5073
5074 // Approximately handle AVX Galois Field Affine Transformation
5075 //
5076 // e.g.,
5077 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5078 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5079 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5080 // Out A x b
5081 // where A and x are packed matrices, b is a vector,
5082 // Out = A * x + b in GF(2)
5083 //
5084 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5085 // computation also includes a parity calculation.
5086 //
5087 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5088 // Out_Shadow = (V1_Shadow & V2_Shadow)
5089 // | (V1 & V2_Shadow)
5090 // | (V1_Shadow & V2 )
5091 //
5092 // We approximate the shadow of gf2p8affineqb using:
5093 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5094 // | gf2p8affineqb(x, A_shadow, 0)
5095 // | gf2p8affineqb(x_Shadow, A, 0)
5096 // | set1_epi8(b_Shadow)
5097 //
5098 // This approximation has false negatives: if an intermediate dot-product
5099 // contains an even number of 1's, the parity is 0.
5100 // It has no false positives.
5101 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5102 IRBuilder<> IRB(&I);
5103
5104 assert(I.arg_size() == 3);
5105 Value *A = I.getOperand(0);
5106 Value *X = I.getOperand(1);
5107 Value *B = I.getOperand(2);
5108
5109 assert(isFixedIntVector(A));
5110 assert(cast<VectorType>(A->getType())
5111 ->getElementType()
5112 ->getScalarSizeInBits() == 8);
5113
5114 assert(A->getType() == X->getType());
5115
5116 assert(B->getType()->isIntegerTy());
5117 assert(B->getType()->getScalarSizeInBits() == 8);
5118
5119 assert(I.getType() == A->getType());
5120
5121 Value *AShadow = getShadow(A);
5122 Value *XShadow = getShadow(X);
5123 Value *BZeroShadow = getCleanShadow(B);
5124
5125 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5126 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5127 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5128 {X, AShadow, BZeroShadow});
5129 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5130 {XShadow, A, BZeroShadow});
5131
5132 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5133 Value *BShadow = getShadow(B);
5134 Value *BBroadcastShadow = getCleanShadow(AShadow);
5135 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5136 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5137 // lower appropriately (e.g., VPBROADCASTB).
5138 // Besides, b is often a constant, in which case it is fully initialized.
5139 for (unsigned i = 0; i < NumElements; i++)
5140 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5141
5142 setShadow(&I, IRB.CreateOr(
5143 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5144 setOriginForNaryOp(I);
5145 }
5146
5147 // Handle Arm NEON vector load intrinsics (vld*).
5148 //
5149 // The WithLane instructions (ld[234]lane) are similar to:
5150 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5151 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5152 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5153 // %A)
5154 //
5155 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5156 // to:
5157 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5158 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5159 unsigned int numArgs = I.arg_size();
5160
5161 // Return type is a struct of vectors of integers or floating-point
5162 assert(I.getType()->isStructTy());
5163 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5164 assert(RetTy->getNumElements() > 0);
5166 RetTy->getElementType(0)->isFPOrFPVectorTy());
5167 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5168 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5169
5170 if (WithLane) {
5171 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5172 assert(4 <= numArgs && numArgs <= 6);
5173
5174 // Return type is a struct of the input vectors
5175 assert(RetTy->getNumElements() + 2 == numArgs);
5176 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5177 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5178 } else {
5179 assert(numArgs == 1);
5180 }
5181
5182 IRBuilder<> IRB(&I);
5183
5184 SmallVector<Value *, 6> ShadowArgs;
5185 if (WithLane) {
5186 for (unsigned int i = 0; i < numArgs - 2; i++)
5187 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5188
5189 // Lane number, passed verbatim
5190 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5191 ShadowArgs.push_back(LaneNumber);
5192
5193 // TODO: blend shadow of lane number into output shadow?
5194 insertCheckShadowOf(LaneNumber, &I);
5195 }
5196
5197 Value *Src = I.getArgOperand(numArgs - 1);
5198 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5199
5200 Type *SrcShadowTy = getShadowTy(Src);
5201 auto [SrcShadowPtr, SrcOriginPtr] =
5202 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5203 ShadowArgs.push_back(SrcShadowPtr);
5204
5205 // The NEON vector load instructions handled by this function all have
5206 // integer variants. It is easier to use those rather than trying to cast
5207 // a struct of vectors of floats into a struct of vectors of integers.
5208 CallInst *CI =
5209 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5210 setShadow(&I, CI);
5211
5212 if (!MS.TrackOrigins)
5213 return;
5214
5215 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5216 setOrigin(&I, PtrSrcOrigin);
5217 }
5218
5219 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5220 /// and vst{2,3,4}lane).
5221 ///
5222 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5223 /// last argument, with the initial arguments being the inputs (and lane
5224 /// number for vst{2,3,4}lane). They return void.
5225 ///
5226 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5227 /// abcdabcdabcdabcd... into *outP
5228 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5229 /// writes aaaa...bbbb...cccc...dddd... into *outP
5230 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5231 /// These instructions can all be instrumented with essentially the same
5232 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5233 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5234 IRBuilder<> IRB(&I);
5235
5236 // Don't use getNumOperands() because it includes the callee
5237 int numArgOperands = I.arg_size();
5238
5239 // The last arg operand is the output (pointer)
5240 assert(numArgOperands >= 1);
5241 Value *Addr = I.getArgOperand(numArgOperands - 1);
5242 assert(Addr->getType()->isPointerTy());
5243 int skipTrailingOperands = 1;
5244
5246 insertCheckShadowOf(Addr, &I);
5247
5248 // Second-last operand is the lane number (for vst{2,3,4}lane)
5249 if (useLane) {
5250 skipTrailingOperands++;
5251 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5253 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5254 }
5255
5256 SmallVector<Value *, 8> ShadowArgs;
5257 // All the initial operands are the inputs
5258 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5259 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5260 Value *Shadow = getShadow(&I, i);
5261 ShadowArgs.append(1, Shadow);
5262 }
5263
5264 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5265 // e.g., for:
5266 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5267 // we know the type of the output (and its shadow) is <16 x i8>.
5268 //
5269 // Arm NEON VST is unusual because the last argument is the output address:
5270 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5271 // call void @llvm.aarch64.neon.st2.v16i8.p0
5272 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5273 // and we have no type information about P's operand. We must manually
5274 // compute the type (<16 x i8> x 2).
5275 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5276 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5277 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5278 (numArgOperands - skipTrailingOperands));
5279 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5280
5281 if (useLane)
5282 ShadowArgs.append(1,
5283 I.getArgOperand(numArgOperands - skipTrailingOperands));
5284
5285 Value *OutputShadowPtr, *OutputOriginPtr;
5286 // AArch64 NEON does not need alignment (unless OS requires it)
5287 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5288 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5289 ShadowArgs.append(1, OutputShadowPtr);
5290
5291 CallInst *CI =
5292 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5293 setShadow(&I, CI);
5294
5295 if (MS.TrackOrigins) {
5296 // TODO: if we modelled the vst* instruction more precisely, we could
5297 // more accurately track the origins (e.g., if both inputs are
5298 // uninitialized for vst2, we currently blame the second input, even
5299 // though part of the output depends only on the first input).
5300 //
5301 // This is particularly imprecise for vst{2,3,4}lane, since only one
5302 // lane of each input is actually copied to the output.
5303 OriginCombiner OC(this, IRB);
5304 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5305 OC.Add(I.getArgOperand(i));
5306
5307 const DataLayout &DL = F.getDataLayout();
5308 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5309 OutputOriginPtr);
5310 }
5311 }
5312
5313 /// Handle intrinsics by applying the intrinsic to the shadows.
5314 ///
5315 /// The trailing arguments are passed verbatim to the intrinsic, though any
5316 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5317 /// intrinsic with one trailing verbatim argument:
5318 /// out = intrinsic(var1, var2, opType)
5319 /// we compute:
5320 /// shadow[out] =
5321 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5322 ///
5323 /// Typically, shadowIntrinsicID will be specified by the caller to be
5324 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5325 /// intrinsic of the same type.
5326 ///
5327 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5328 /// bit-patterns (for example, if the intrinsic accepts floats for
5329 /// var1, we require that it doesn't care if inputs are NaNs).
5330 ///
5331 /// For example, this can be applied to the Arm NEON vector table intrinsics
5332 /// (tbl{1,2,3,4}).
5333 ///
5334 /// The origin is approximated using setOriginForNaryOp.
5335 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5336 Intrinsic::ID shadowIntrinsicID,
5337 unsigned int trailingVerbatimArgs) {
5338 IRBuilder<> IRB(&I);
5339
5340 assert(trailingVerbatimArgs < I.arg_size());
5341
5342 SmallVector<Value *, 8> ShadowArgs;
5343 // Don't use getNumOperands() because it includes the callee
5344 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5345 Value *Shadow = getShadow(&I, i);
5346
5347 // Shadows are integer-ish types but some intrinsics require a
5348 // different (e.g., floating-point) type.
5349 ShadowArgs.push_back(
5350 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5351 }
5352
5353 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5354 i++) {
5355 Value *Arg = I.getArgOperand(i);
5356 ShadowArgs.push_back(Arg);
5357 }
5358
5359 CallInst *CI =
5360 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5361 Value *CombinedShadow = CI;
5362
5363 // Combine the computed shadow with the shadow of trailing args
5364 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5365 i++) {
5366 Value *Shadow =
5367 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5368 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5369 }
5370
5371 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5372
5373 setOriginForNaryOp(I);
5374 }
5375
5376 // Approximation only
5377 //
5378 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5379 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5380 assert(I.arg_size() == 2);
5381
5382 handleShadowOr(I);
5383 }
5384
5385 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5386 switch (I.getIntrinsicID()) {
5387 case Intrinsic::uadd_with_overflow:
5388 case Intrinsic::sadd_with_overflow:
5389 case Intrinsic::usub_with_overflow:
5390 case Intrinsic::ssub_with_overflow:
5391 case Intrinsic::umul_with_overflow:
5392 case Intrinsic::smul_with_overflow:
5393 handleArithmeticWithOverflow(I);
5394 break;
5395 case Intrinsic::abs:
5396 handleAbsIntrinsic(I);
5397 break;
5398 case Intrinsic::bitreverse:
5399 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5400 /*trailingVerbatimArgs*/ 0);
5401 break;
5402 case Intrinsic::is_fpclass:
5403 handleIsFpClass(I);
5404 break;
5405 case Intrinsic::lifetime_start:
5406 handleLifetimeStart(I);
5407 break;
5408 case Intrinsic::launder_invariant_group:
5409 case Intrinsic::strip_invariant_group:
5410 handleInvariantGroup(I);
5411 break;
5412 case Intrinsic::bswap:
5413 handleBswap(I);
5414 break;
5415 case Intrinsic::ctlz:
5416 case Intrinsic::cttz:
5417 handleCountLeadingTrailingZeros(I);
5418 break;
5419 case Intrinsic::masked_compressstore:
5420 handleMaskedCompressStore(I);
5421 break;
5422 case Intrinsic::masked_expandload:
5423 handleMaskedExpandLoad(I);
5424 break;
5425 case Intrinsic::masked_gather:
5426 handleMaskedGather(I);
5427 break;
5428 case Intrinsic::masked_scatter:
5429 handleMaskedScatter(I);
5430 break;
5431 case Intrinsic::masked_store:
5432 handleMaskedStore(I);
5433 break;
5434 case Intrinsic::masked_load:
5435 handleMaskedLoad(I);
5436 break;
5437 case Intrinsic::vector_reduce_and:
5438 handleVectorReduceAndIntrinsic(I);
5439 break;
5440 case Intrinsic::vector_reduce_or:
5441 handleVectorReduceOrIntrinsic(I);
5442 break;
5443
5444 case Intrinsic::vector_reduce_add:
5445 case Intrinsic::vector_reduce_xor:
5446 case Intrinsic::vector_reduce_mul:
5447 // Signed/Unsigned Min/Max
5448 // TODO: handling similarly to AND/OR may be more precise.
5449 case Intrinsic::vector_reduce_smax:
5450 case Intrinsic::vector_reduce_smin:
5451 case Intrinsic::vector_reduce_umax:
5452 case Intrinsic::vector_reduce_umin:
5453 // TODO: this has no false positives, but arguably we should check that all
5454 // the bits are initialized.
5455 case Intrinsic::vector_reduce_fmax:
5456 case Intrinsic::vector_reduce_fmin:
5457 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5458 break;
5459
5460 case Intrinsic::vector_reduce_fadd:
5461 case Intrinsic::vector_reduce_fmul:
5462 handleVectorReduceWithStarterIntrinsic(I);
5463 break;
5464
5465 case Intrinsic::scmp:
5466 case Intrinsic::ucmp: {
5467 handleShadowOr(I);
5468 break;
5469 }
5470
5471 case Intrinsic::fshl:
5472 case Intrinsic::fshr:
5473 handleFunnelShift(I);
5474 break;
5475
5476 case Intrinsic::is_constant:
5477 // The result of llvm.is.constant() is always defined.
5478 setShadow(&I, getCleanShadow(&I));
5479 setOrigin(&I, getCleanOrigin());
5480 break;
5481
5482 default:
5483 return false;
5484 }
5485
5486 return true;
5487 }
5488
5489 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5490 switch (I.getIntrinsicID()) {
5491 case Intrinsic::x86_sse_stmxcsr:
5492 handleStmxcsr(I);
5493 break;
5494 case Intrinsic::x86_sse_ldmxcsr:
5495 handleLdmxcsr(I);
5496 break;
5497
5498 // Convert Scalar Double Precision Floating-Point Value
5499 // to Unsigned Doubleword Integer
5500 // etc.
5501 case Intrinsic::x86_avx512_vcvtsd2usi64:
5502 case Intrinsic::x86_avx512_vcvtsd2usi32:
5503 case Intrinsic::x86_avx512_vcvtss2usi64:
5504 case Intrinsic::x86_avx512_vcvtss2usi32:
5505 case Intrinsic::x86_avx512_cvttss2usi64:
5506 case Intrinsic::x86_avx512_cvttss2usi:
5507 case Intrinsic::x86_avx512_cvttsd2usi64:
5508 case Intrinsic::x86_avx512_cvttsd2usi:
5509 case Intrinsic::x86_avx512_cvtusi2ss:
5510 case Intrinsic::x86_avx512_cvtusi642sd:
5511 case Intrinsic::x86_avx512_cvtusi642ss:
5512 handleSSEVectorConvertIntrinsic(I, 1, true);
5513 break;
5514 case Intrinsic::x86_sse2_cvtsd2si64:
5515 case Intrinsic::x86_sse2_cvtsd2si:
5516 case Intrinsic::x86_sse2_cvtsd2ss:
5517 case Intrinsic::x86_sse2_cvttsd2si64:
5518 case Intrinsic::x86_sse2_cvttsd2si:
5519 case Intrinsic::x86_sse_cvtss2si64:
5520 case Intrinsic::x86_sse_cvtss2si:
5521 case Intrinsic::x86_sse_cvttss2si64:
5522 case Intrinsic::x86_sse_cvttss2si:
5523 handleSSEVectorConvertIntrinsic(I, 1);
5524 break;
5525 case Intrinsic::x86_sse_cvtps2pi:
5526 case Intrinsic::x86_sse_cvttps2pi:
5527 handleSSEVectorConvertIntrinsic(I, 2);
5528 break;
5529
5530 // TODO:
5531 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5532 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5533 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5534
5535 case Intrinsic::x86_vcvtps2ph_128:
5536 case Intrinsic::x86_vcvtps2ph_256: {
5537 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5538 break;
5539 }
5540
5541 // Convert Packed Single Precision Floating-Point Values
5542 // to Packed Signed Doubleword Integer Values
5543 //
5544 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5545 // (<16 x float>, <16 x i32>, i16, i32)
5546 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5547 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5548 break;
5549
5550 // Convert Packed Double Precision Floating-Point Values
5551 // to Packed Single Precision Floating-Point Values
5552 case Intrinsic::x86_sse2_cvtpd2ps:
5553 case Intrinsic::x86_sse2_cvtps2dq:
5554 case Intrinsic::x86_sse2_cvtpd2dq:
5555 case Intrinsic::x86_sse2_cvttps2dq:
5556 case Intrinsic::x86_sse2_cvttpd2dq:
5557 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5558 case Intrinsic::x86_avx_cvt_ps2dq_256:
5559 case Intrinsic::x86_avx_cvt_pd2dq_256:
5560 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5561 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5562 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5563 break;
5564 }
5565
5566 // Convert Single-Precision FP Value to 16-bit FP Value
5567 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5568 // (<16 x float>, i32, <16 x i16>, i16)
5569 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5570 // (<4 x float>, i32, <8 x i16>, i8)
5571 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5572 // (<8 x float>, i32, <8 x i16>, i8)
5573 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5574 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5575 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5576 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5577 break;
5578
5579 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5580 case Intrinsic::x86_avx512_psll_w_512:
5581 case Intrinsic::x86_avx512_psll_d_512:
5582 case Intrinsic::x86_avx512_psll_q_512:
5583 case Intrinsic::x86_avx512_pslli_w_512:
5584 case Intrinsic::x86_avx512_pslli_d_512:
5585 case Intrinsic::x86_avx512_pslli_q_512:
5586 case Intrinsic::x86_avx512_psrl_w_512:
5587 case Intrinsic::x86_avx512_psrl_d_512:
5588 case Intrinsic::x86_avx512_psrl_q_512:
5589 case Intrinsic::x86_avx512_psra_w_512:
5590 case Intrinsic::x86_avx512_psra_d_512:
5591 case Intrinsic::x86_avx512_psra_q_512:
5592 case Intrinsic::x86_avx512_psrli_w_512:
5593 case Intrinsic::x86_avx512_psrli_d_512:
5594 case Intrinsic::x86_avx512_psrli_q_512:
5595 case Intrinsic::x86_avx512_psrai_w_512:
5596 case Intrinsic::x86_avx512_psrai_d_512:
5597 case Intrinsic::x86_avx512_psrai_q_512:
5598 case Intrinsic::x86_avx512_psra_q_256:
5599 case Intrinsic::x86_avx512_psra_q_128:
5600 case Intrinsic::x86_avx512_psrai_q_256:
5601 case Intrinsic::x86_avx512_psrai_q_128:
5602 case Intrinsic::x86_avx2_psll_w:
5603 case Intrinsic::x86_avx2_psll_d:
5604 case Intrinsic::x86_avx2_psll_q:
5605 case Intrinsic::x86_avx2_pslli_w:
5606 case Intrinsic::x86_avx2_pslli_d:
5607 case Intrinsic::x86_avx2_pslli_q:
5608 case Intrinsic::x86_avx2_psrl_w:
5609 case Intrinsic::x86_avx2_psrl_d:
5610 case Intrinsic::x86_avx2_psrl_q:
5611 case Intrinsic::x86_avx2_psra_w:
5612 case Intrinsic::x86_avx2_psra_d:
5613 case Intrinsic::x86_avx2_psrli_w:
5614 case Intrinsic::x86_avx2_psrli_d:
5615 case Intrinsic::x86_avx2_psrli_q:
5616 case Intrinsic::x86_avx2_psrai_w:
5617 case Intrinsic::x86_avx2_psrai_d:
5618 case Intrinsic::x86_sse2_psll_w:
5619 case Intrinsic::x86_sse2_psll_d:
5620 case Intrinsic::x86_sse2_psll_q:
5621 case Intrinsic::x86_sse2_pslli_w:
5622 case Intrinsic::x86_sse2_pslli_d:
5623 case Intrinsic::x86_sse2_pslli_q:
5624 case Intrinsic::x86_sse2_psrl_w:
5625 case Intrinsic::x86_sse2_psrl_d:
5626 case Intrinsic::x86_sse2_psrl_q:
5627 case Intrinsic::x86_sse2_psra_w:
5628 case Intrinsic::x86_sse2_psra_d:
5629 case Intrinsic::x86_sse2_psrli_w:
5630 case Intrinsic::x86_sse2_psrli_d:
5631 case Intrinsic::x86_sse2_psrli_q:
5632 case Intrinsic::x86_sse2_psrai_w:
5633 case Intrinsic::x86_sse2_psrai_d:
5634 case Intrinsic::x86_mmx_psll_w:
5635 case Intrinsic::x86_mmx_psll_d:
5636 case Intrinsic::x86_mmx_psll_q:
5637 case Intrinsic::x86_mmx_pslli_w:
5638 case Intrinsic::x86_mmx_pslli_d:
5639 case Intrinsic::x86_mmx_pslli_q:
5640 case Intrinsic::x86_mmx_psrl_w:
5641 case Intrinsic::x86_mmx_psrl_d:
5642 case Intrinsic::x86_mmx_psrl_q:
5643 case Intrinsic::x86_mmx_psra_w:
5644 case Intrinsic::x86_mmx_psra_d:
5645 case Intrinsic::x86_mmx_psrli_w:
5646 case Intrinsic::x86_mmx_psrli_d:
5647 case Intrinsic::x86_mmx_psrli_q:
5648 case Intrinsic::x86_mmx_psrai_w:
5649 case Intrinsic::x86_mmx_psrai_d:
5650 handleVectorShiftIntrinsic(I, /* Variable */ false);
5651 break;
5652 case Intrinsic::x86_avx2_psllv_d:
5653 case Intrinsic::x86_avx2_psllv_d_256:
5654 case Intrinsic::x86_avx512_psllv_d_512:
5655 case Intrinsic::x86_avx2_psllv_q:
5656 case Intrinsic::x86_avx2_psllv_q_256:
5657 case Intrinsic::x86_avx512_psllv_q_512:
5658 case Intrinsic::x86_avx2_psrlv_d:
5659 case Intrinsic::x86_avx2_psrlv_d_256:
5660 case Intrinsic::x86_avx512_psrlv_d_512:
5661 case Intrinsic::x86_avx2_psrlv_q:
5662 case Intrinsic::x86_avx2_psrlv_q_256:
5663 case Intrinsic::x86_avx512_psrlv_q_512:
5664 case Intrinsic::x86_avx2_psrav_d:
5665 case Intrinsic::x86_avx2_psrav_d_256:
5666 case Intrinsic::x86_avx512_psrav_d_512:
5667 case Intrinsic::x86_avx512_psrav_q_128:
5668 case Intrinsic::x86_avx512_psrav_q_256:
5669 case Intrinsic::x86_avx512_psrav_q_512:
5670 handleVectorShiftIntrinsic(I, /* Variable */ true);
5671 break;
5672
5673 // Pack with Signed/Unsigned Saturation
5674 case Intrinsic::x86_sse2_packsswb_128:
5675 case Intrinsic::x86_sse2_packssdw_128:
5676 case Intrinsic::x86_sse2_packuswb_128:
5677 case Intrinsic::x86_sse41_packusdw:
5678 case Intrinsic::x86_avx2_packsswb:
5679 case Intrinsic::x86_avx2_packssdw:
5680 case Intrinsic::x86_avx2_packuswb:
5681 case Intrinsic::x86_avx2_packusdw:
5682 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5683 // (<32 x i16> %a, <32 x i16> %b)
5684 // <32 x i16> @llvm.x86.avx512.packssdw.512
5685 // (<16 x i32> %a, <16 x i32> %b)
5686 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5687 case Intrinsic::x86_avx512_packsswb_512:
5688 case Intrinsic::x86_avx512_packssdw_512:
5689 case Intrinsic::x86_avx512_packuswb_512:
5690 case Intrinsic::x86_avx512_packusdw_512:
5691 handleVectorPackIntrinsic(I);
5692 break;
5693
5694 case Intrinsic::x86_sse41_pblendvb:
5695 case Intrinsic::x86_sse41_blendvpd:
5696 case Intrinsic::x86_sse41_blendvps:
5697 case Intrinsic::x86_avx_blendv_pd_256:
5698 case Intrinsic::x86_avx_blendv_ps_256:
5699 case Intrinsic::x86_avx2_pblendvb:
5700 handleBlendvIntrinsic(I);
5701 break;
5702
5703 case Intrinsic::x86_avx_dp_ps_256:
5704 case Intrinsic::x86_sse41_dppd:
5705 case Intrinsic::x86_sse41_dpps:
5706 handleDppIntrinsic(I);
5707 break;
5708
5709 case Intrinsic::x86_mmx_packsswb:
5710 case Intrinsic::x86_mmx_packuswb:
5711 handleVectorPackIntrinsic(I, 16);
5712 break;
5713
5714 case Intrinsic::x86_mmx_packssdw:
5715 handleVectorPackIntrinsic(I, 32);
5716 break;
5717
5718 case Intrinsic::x86_mmx_psad_bw:
5719 handleVectorSadIntrinsic(I, true);
5720 break;
5721 case Intrinsic::x86_sse2_psad_bw:
5722 case Intrinsic::x86_avx2_psad_bw:
5723 handleVectorSadIntrinsic(I);
5724 break;
5725
5726 // Multiply and Add Packed Words
5727 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5728 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5729 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5730 //
5731 // Multiply and Add Packed Signed and Unsigned Bytes
5732 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5733 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5734 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5735 //
5736 // These intrinsics are auto-upgraded into non-masked forms:
5737 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5738 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5739 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5740 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5741 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5742 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5743 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5744 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5745 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5746 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5747 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5748 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5749 case Intrinsic::x86_sse2_pmadd_wd:
5750 case Intrinsic::x86_avx2_pmadd_wd:
5751 case Intrinsic::x86_avx512_pmaddw_d_512:
5752 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5753 case Intrinsic::x86_avx2_pmadd_ub_sw:
5754 case Intrinsic::x86_avx512_pmaddubs_w_512:
5755 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2);
5756 break;
5757
5758 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5759 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5760 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/8);
5761 break;
5762
5763 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5764 case Intrinsic::x86_mmx_pmadd_wd:
5765 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5766 break;
5767
5768 // AVX Vector Neural Network Instructions: bytes
5769 //
5770 // Multiply and Add Packed Signed and Unsigned Bytes
5771 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5772 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5773 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5774 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5775 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5776 // (<16 x i32>, <64 x i8>, <64 x i8>)
5777 //
5778 // Multiply and Add Unsigned and Signed Bytes With Saturation
5779 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5780 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5781 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5782 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5783 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5784 // (<16 x i32>, <64 x i8>, <64 x i8>)
5785 //
5786 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5787 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5788 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5789 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5790 //
5791 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5792 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5793 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5794 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5795 //
5796 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5797 // (<16 x i32>, <16 x i32>, <16 x i32>)
5798 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5799 // (<16 x i32>, <16 x i32>, <16 x i32>)
5800 //
5801 // These intrinsics are auto-upgraded into non-masked forms:
5802 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5803 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5804 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5805 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5806 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5807 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5808 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
5809 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5810 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
5811 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5812 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
5813 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5814 //
5815 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
5816 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5817 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
5818 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5819 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
5820 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5821 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
5822 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5823 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
5824 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5825 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
5826 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5827 case Intrinsic::x86_avx512_vpdpbusd_128:
5828 case Intrinsic::x86_avx512_vpdpbusd_256:
5829 case Intrinsic::x86_avx512_vpdpbusd_512:
5830 case Intrinsic::x86_avx512_vpdpbusds_128:
5831 case Intrinsic::x86_avx512_vpdpbusds_256:
5832 case Intrinsic::x86_avx512_vpdpbusds_512:
5833 case Intrinsic::x86_avx2_vpdpbssd_128:
5834 case Intrinsic::x86_avx2_vpdpbssd_256:
5835 case Intrinsic::x86_avx10_vpdpbssd_512:
5836 case Intrinsic::x86_avx2_vpdpbssds_128:
5837 case Intrinsic::x86_avx2_vpdpbssds_256:
5838 case Intrinsic::x86_avx10_vpdpbssds_512:
5839 case Intrinsic::x86_avx2_vpdpbsud_128:
5840 case Intrinsic::x86_avx2_vpdpbsud_256:
5841 case Intrinsic::x86_avx10_vpdpbsud_512:
5842 case Intrinsic::x86_avx2_vpdpbsuds_128:
5843 case Intrinsic::x86_avx2_vpdpbsuds_256:
5844 case Intrinsic::x86_avx10_vpdpbsuds_512:
5845 case Intrinsic::x86_avx2_vpdpbuud_128:
5846 case Intrinsic::x86_avx2_vpdpbuud_256:
5847 case Intrinsic::x86_avx10_vpdpbuud_512:
5848 case Intrinsic::x86_avx2_vpdpbuuds_128:
5849 case Intrinsic::x86_avx2_vpdpbuuds_256:
5850 case Intrinsic::x86_avx10_vpdpbuuds_512:
5851 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/4, /*EltSize=*/8);
5852 break;
5853
5854 // AVX Vector Neural Network Instructions: words
5855 //
5856 // Multiply and Add Signed Word Integers
5857 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
5858 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5859 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
5860 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5861 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
5862 // (<16 x i32>, <16 x i32>, <16 x i32>)
5863 //
5864 // Multiply and Add Signed Word Integers With Saturation
5865 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
5866 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5867 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
5868 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5869 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
5870 // (<16 x i32>, <16 x i32>, <16 x i32>)
5871 //
5872 // These intrinsics are auto-upgraded into non-masked forms:
5873 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
5874 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5875 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
5876 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5877 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
5878 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5879 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
5880 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5881 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
5882 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5883 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
5884 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5885 //
5886 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
5887 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5888 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
5889 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5890 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
5891 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5892 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
5893 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5894 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
5895 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5896 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
5897 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5898 case Intrinsic::x86_avx512_vpdpwssd_128:
5899 case Intrinsic::x86_avx512_vpdpwssd_256:
5900 case Intrinsic::x86_avx512_vpdpwssd_512:
5901 case Intrinsic::x86_avx512_vpdpwssds_128:
5902 case Intrinsic::x86_avx512_vpdpwssds_256:
5903 case Intrinsic::x86_avx512_vpdpwssds_512:
5904 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5905 break;
5906
5907 // TODO: Dot Product of BF16 Pairs Accumulated Into Packed Single
5908 // Precision
5909 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
5910 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5911 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
5912 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
5913 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
5914 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
5915 // handleVectorPmaddIntrinsic() currently only handles integer types.
5916
5917 case Intrinsic::x86_sse_cmp_ss:
5918 case Intrinsic::x86_sse2_cmp_sd:
5919 case Intrinsic::x86_sse_comieq_ss:
5920 case Intrinsic::x86_sse_comilt_ss:
5921 case Intrinsic::x86_sse_comile_ss:
5922 case Intrinsic::x86_sse_comigt_ss:
5923 case Intrinsic::x86_sse_comige_ss:
5924 case Intrinsic::x86_sse_comineq_ss:
5925 case Intrinsic::x86_sse_ucomieq_ss:
5926 case Intrinsic::x86_sse_ucomilt_ss:
5927 case Intrinsic::x86_sse_ucomile_ss:
5928 case Intrinsic::x86_sse_ucomigt_ss:
5929 case Intrinsic::x86_sse_ucomige_ss:
5930 case Intrinsic::x86_sse_ucomineq_ss:
5931 case Intrinsic::x86_sse2_comieq_sd:
5932 case Intrinsic::x86_sse2_comilt_sd:
5933 case Intrinsic::x86_sse2_comile_sd:
5934 case Intrinsic::x86_sse2_comigt_sd:
5935 case Intrinsic::x86_sse2_comige_sd:
5936 case Intrinsic::x86_sse2_comineq_sd:
5937 case Intrinsic::x86_sse2_ucomieq_sd:
5938 case Intrinsic::x86_sse2_ucomilt_sd:
5939 case Intrinsic::x86_sse2_ucomile_sd:
5940 case Intrinsic::x86_sse2_ucomigt_sd:
5941 case Intrinsic::x86_sse2_ucomige_sd:
5942 case Intrinsic::x86_sse2_ucomineq_sd:
5943 handleVectorCompareScalarIntrinsic(I);
5944 break;
5945
5946 case Intrinsic::x86_avx_cmp_pd_256:
5947 case Intrinsic::x86_avx_cmp_ps_256:
5948 case Intrinsic::x86_sse2_cmp_pd:
5949 case Intrinsic::x86_sse_cmp_ps:
5950 handleVectorComparePackedIntrinsic(I);
5951 break;
5952
5953 case Intrinsic::x86_bmi_bextr_32:
5954 case Intrinsic::x86_bmi_bextr_64:
5955 case Intrinsic::x86_bmi_bzhi_32:
5956 case Intrinsic::x86_bmi_bzhi_64:
5957 case Intrinsic::x86_bmi_pdep_32:
5958 case Intrinsic::x86_bmi_pdep_64:
5959 case Intrinsic::x86_bmi_pext_32:
5960 case Intrinsic::x86_bmi_pext_64:
5961 handleBmiIntrinsic(I);
5962 break;
5963
5964 case Intrinsic::x86_pclmulqdq:
5965 case Intrinsic::x86_pclmulqdq_256:
5966 case Intrinsic::x86_pclmulqdq_512:
5967 handlePclmulIntrinsic(I);
5968 break;
5969
5970 case Intrinsic::x86_avx_round_pd_256:
5971 case Intrinsic::x86_avx_round_ps_256:
5972 case Intrinsic::x86_sse41_round_pd:
5973 case Intrinsic::x86_sse41_round_ps:
5974 handleRoundPdPsIntrinsic(I);
5975 break;
5976
5977 case Intrinsic::x86_sse41_round_sd:
5978 case Intrinsic::x86_sse41_round_ss:
5979 handleUnarySdSsIntrinsic(I);
5980 break;
5981
5982 case Intrinsic::x86_sse2_max_sd:
5983 case Intrinsic::x86_sse_max_ss:
5984 case Intrinsic::x86_sse2_min_sd:
5985 case Intrinsic::x86_sse_min_ss:
5986 handleBinarySdSsIntrinsic(I);
5987 break;
5988
5989 case Intrinsic::x86_avx_vtestc_pd:
5990 case Intrinsic::x86_avx_vtestc_pd_256:
5991 case Intrinsic::x86_avx_vtestc_ps:
5992 case Intrinsic::x86_avx_vtestc_ps_256:
5993 case Intrinsic::x86_avx_vtestnzc_pd:
5994 case Intrinsic::x86_avx_vtestnzc_pd_256:
5995 case Intrinsic::x86_avx_vtestnzc_ps:
5996 case Intrinsic::x86_avx_vtestnzc_ps_256:
5997 case Intrinsic::x86_avx_vtestz_pd:
5998 case Intrinsic::x86_avx_vtestz_pd_256:
5999 case Intrinsic::x86_avx_vtestz_ps:
6000 case Intrinsic::x86_avx_vtestz_ps_256:
6001 case Intrinsic::x86_avx_ptestc_256:
6002 case Intrinsic::x86_avx_ptestnzc_256:
6003 case Intrinsic::x86_avx_ptestz_256:
6004 case Intrinsic::x86_sse41_ptestc:
6005 case Intrinsic::x86_sse41_ptestnzc:
6006 case Intrinsic::x86_sse41_ptestz:
6007 handleVtestIntrinsic(I);
6008 break;
6009
6010 // Packed Horizontal Add/Subtract
6011 case Intrinsic::x86_ssse3_phadd_w:
6012 case Intrinsic::x86_ssse3_phadd_w_128:
6013 case Intrinsic::x86_avx2_phadd_w:
6014 case Intrinsic::x86_ssse3_phsub_w:
6015 case Intrinsic::x86_ssse3_phsub_w_128:
6016 case Intrinsic::x86_avx2_phsub_w: {
6017 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
6018 break;
6019 }
6020
6021 // Packed Horizontal Add/Subtract
6022 case Intrinsic::x86_ssse3_phadd_d:
6023 case Intrinsic::x86_ssse3_phadd_d_128:
6024 case Intrinsic::x86_avx2_phadd_d:
6025 case Intrinsic::x86_ssse3_phsub_d:
6026 case Intrinsic::x86_ssse3_phsub_d_128:
6027 case Intrinsic::x86_avx2_phsub_d: {
6028 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/32);
6029 break;
6030 }
6031
6032 // Packed Horizontal Add/Subtract and Saturate
6033 case Intrinsic::x86_ssse3_phadd_sw:
6034 case Intrinsic::x86_ssse3_phadd_sw_128:
6035 case Intrinsic::x86_avx2_phadd_sw:
6036 case Intrinsic::x86_ssse3_phsub_sw:
6037 case Intrinsic::x86_ssse3_phsub_sw_128:
6038 case Intrinsic::x86_avx2_phsub_sw: {
6039 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
6040 break;
6041 }
6042
6043 // Packed Single/Double Precision Floating-Point Horizontal Add
6044 case Intrinsic::x86_sse3_hadd_ps:
6045 case Intrinsic::x86_sse3_hadd_pd:
6046 case Intrinsic::x86_avx_hadd_pd_256:
6047 case Intrinsic::x86_avx_hadd_ps_256:
6048 case Intrinsic::x86_sse3_hsub_ps:
6049 case Intrinsic::x86_sse3_hsub_pd:
6050 case Intrinsic::x86_avx_hsub_pd_256:
6051 case Intrinsic::x86_avx_hsub_ps_256: {
6052 handlePairwiseShadowOrIntrinsic(I);
6053 break;
6054 }
6055
6056 case Intrinsic::x86_avx_maskstore_ps:
6057 case Intrinsic::x86_avx_maskstore_pd:
6058 case Intrinsic::x86_avx_maskstore_ps_256:
6059 case Intrinsic::x86_avx_maskstore_pd_256:
6060 case Intrinsic::x86_avx2_maskstore_d:
6061 case Intrinsic::x86_avx2_maskstore_q:
6062 case Intrinsic::x86_avx2_maskstore_d_256:
6063 case Intrinsic::x86_avx2_maskstore_q_256: {
6064 handleAVXMaskedStore(I);
6065 break;
6066 }
6067
6068 case Intrinsic::x86_avx_maskload_ps:
6069 case Intrinsic::x86_avx_maskload_pd:
6070 case Intrinsic::x86_avx_maskload_ps_256:
6071 case Intrinsic::x86_avx_maskload_pd_256:
6072 case Intrinsic::x86_avx2_maskload_d:
6073 case Intrinsic::x86_avx2_maskload_q:
6074 case Intrinsic::x86_avx2_maskload_d_256:
6075 case Intrinsic::x86_avx2_maskload_q_256: {
6076 handleAVXMaskedLoad(I);
6077 break;
6078 }
6079
6080 // Packed
6081 case Intrinsic::x86_avx512fp16_add_ph_512:
6082 case Intrinsic::x86_avx512fp16_sub_ph_512:
6083 case Intrinsic::x86_avx512fp16_mul_ph_512:
6084 case Intrinsic::x86_avx512fp16_div_ph_512:
6085 case Intrinsic::x86_avx512fp16_max_ph_512:
6086 case Intrinsic::x86_avx512fp16_min_ph_512:
6087 case Intrinsic::x86_avx512_min_ps_512:
6088 case Intrinsic::x86_avx512_min_pd_512:
6089 case Intrinsic::x86_avx512_max_ps_512:
6090 case Intrinsic::x86_avx512_max_pd_512: {
6091 // These AVX512 variants contain the rounding mode as a trailing flag.
6092 // Earlier variants do not have a trailing flag and are already handled
6093 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6094 // maybeHandleUnknownIntrinsic.
6095 [[maybe_unused]] bool Success =
6096 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6097 assert(Success);
6098 break;
6099 }
6100
6101 case Intrinsic::x86_avx_vpermilvar_pd:
6102 case Intrinsic::x86_avx_vpermilvar_pd_256:
6103 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6104 case Intrinsic::x86_avx_vpermilvar_ps:
6105 case Intrinsic::x86_avx_vpermilvar_ps_256:
6106 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6107 handleAVXVpermilvar(I);
6108 break;
6109 }
6110
6111 case Intrinsic::x86_avx512_vpermi2var_d_128:
6112 case Intrinsic::x86_avx512_vpermi2var_d_256:
6113 case Intrinsic::x86_avx512_vpermi2var_d_512:
6114 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6115 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6116 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6117 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6118 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6119 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6120 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6121 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6122 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6123 case Intrinsic::x86_avx512_vpermi2var_q_128:
6124 case Intrinsic::x86_avx512_vpermi2var_q_256:
6125 case Intrinsic::x86_avx512_vpermi2var_q_512:
6126 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6127 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6128 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6129 handleAVXVpermi2var(I);
6130 break;
6131
6132 // Packed Shuffle
6133 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6134 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6135 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6136 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6137 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6138 //
6139 // The following intrinsics are auto-upgraded:
6140 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6141 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6142 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6143 case Intrinsic::x86_avx2_pshuf_b:
6144 case Intrinsic::x86_sse_pshuf_w:
6145 case Intrinsic::x86_ssse3_pshuf_b_128:
6146 case Intrinsic::x86_ssse3_pshuf_b:
6147 case Intrinsic::x86_avx512_pshuf_b_512:
6148 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6149 /*trailingVerbatimArgs=*/1);
6150 break;
6151
6152 // AVX512 PMOV: Packed MOV, with truncation
6153 // Precisely handled by applying the same intrinsic to the shadow
6154 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6155 case Intrinsic::x86_avx512_mask_pmov_db_512:
6156 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6157 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6158 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6159 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6160 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6161 /*trailingVerbatimArgs=*/1);
6162 break;
6163 }
6164
6165 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6166 // Approximately handled using the corresponding truncation intrinsic
6167 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6168 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6169 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6170 handleIntrinsicByApplyingToShadow(I,
6171 Intrinsic::x86_avx512_mask_pmov_dw_512,
6172 /* trailingVerbatimArgs=*/1);
6173 break;
6174 }
6175
6176 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6177 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6178 handleIntrinsicByApplyingToShadow(I,
6179 Intrinsic::x86_avx512_mask_pmov_db_512,
6180 /* trailingVerbatimArgs=*/1);
6181 break;
6182 }
6183
6184 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6185 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6186 handleIntrinsicByApplyingToShadow(I,
6187 Intrinsic::x86_avx512_mask_pmov_qb_512,
6188 /* trailingVerbatimArgs=*/1);
6189 break;
6190 }
6191
6192 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6193 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6194 handleIntrinsicByApplyingToShadow(I,
6195 Intrinsic::x86_avx512_mask_pmov_qw_512,
6196 /* trailingVerbatimArgs=*/1);
6197 break;
6198 }
6199
6200 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6201 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6202 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6203 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6204 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6205 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6206 // slow-path handler.
6207 handleAVX512VectorDownConvert(I);
6208 break;
6209 }
6210
6211 // AVX512/AVX10 Reciprocal
6212 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6213 // (<16 x float>, <16 x float>, i16)
6214 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6215 // (<8 x float>, <8 x float>, i8)
6216 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6217 // (<4 x float>, <4 x float>, i8)
6218 //
6219 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6220 // (<8 x double>, <8 x double>, i8)
6221 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6222 // (<4 x double>, <4 x double>, i8)
6223 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6224 // (<2 x double>, <2 x double>, i8)
6225 //
6226 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6227 // (<32 x bfloat>, <32 x bfloat>, i32)
6228 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6229 // (<16 x bfloat>, <16 x bfloat>, i16)
6230 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6231 // (<8 x bfloat>, <8 x bfloat>, i8)
6232 //
6233 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6234 // (<32 x half>, <32 x half>, i32)
6235 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6236 // (<16 x half>, <16 x half>, i16)
6237 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6238 // (<8 x half>, <8 x half>, i8)
6239 //
6240 // TODO: 3-operand variants are not handled:
6241 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6242 // (<2 x double>, <2 x double>, <2 x double>, i8)
6243 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6244 // (<4 x float>, <4 x float>, <4 x float>, i8)
6245 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6246 // (<8 x half>, <8 x half>, <8 x half>, i8)
6247 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6248 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6249 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6250 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6251 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6252 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6253 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6254 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6255 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6256 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6257 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6258 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6259 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6260 /*MaskIndex=*/2);
6261 break;
6262
6263 // AVX512/AVX10 Reciprocal Square Root
6264 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6265 // (<16 x float>, <16 x float>, i16)
6266 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6267 // (<8 x float>, <8 x float>, i8)
6268 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6269 // (<4 x float>, <4 x float>, i8)
6270 //
6271 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6272 // (<8 x double>, <8 x double>, i8)
6273 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6274 // (<4 x double>, <4 x double>, i8)
6275 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6276 // (<2 x double>, <2 x double>, i8)
6277 //
6278 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6279 // (<32 x bfloat>, <32 x bfloat>, i32)
6280 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6281 // (<16 x bfloat>, <16 x bfloat>, i16)
6282 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6283 // (<8 x bfloat>, <8 x bfloat>, i8)
6284 //
6285 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6286 // (<32 x half>, <32 x half>, i32)
6287 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6288 // (<16 x half>, <16 x half>, i16)
6289 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6290 // (<8 x half>, <8 x half>, i8)
6291 //
6292 // TODO: 3-operand variants are not handled:
6293 // <2 x double> @llvm.x86.avx512.rcp14.sd
6294 // (<2 x double>, <2 x double>, <2 x double>, i8)
6295 // <4 x float> @llvm.x86.avx512.rcp14.ss
6296 // (<4 x float>, <4 x float>, <4 x float>, i8)
6297 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6298 // (<8 x half>, <8 x half>, <8 x half>, i8)
6299 case Intrinsic::x86_avx512_rcp14_ps_512:
6300 case Intrinsic::x86_avx512_rcp14_ps_256:
6301 case Intrinsic::x86_avx512_rcp14_ps_128:
6302 case Intrinsic::x86_avx512_rcp14_pd_512:
6303 case Intrinsic::x86_avx512_rcp14_pd_256:
6304 case Intrinsic::x86_avx512_rcp14_pd_128:
6305 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6306 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6307 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6308 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6309 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6310 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6311 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6312 /*MaskIndex=*/2);
6313 break;
6314
6315 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6316 // (<32 x half>, i32, <32 x half>, i32, i32)
6317 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6318 // (<16 x half>, i32, <16 x half>, i32, i16)
6319 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6320 // (<8 x half>, i32, <8 x half>, i32, i8)
6321 //
6322 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6323 // (<16 x float>, i32, <16 x float>, i16, i32)
6324 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6325 // (<8 x float>, i32, <8 x float>, i8)
6326 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6327 // (<4 x float>, i32, <4 x float>, i8)
6328 //
6329 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6330 // (<8 x double>, i32, <8 x double>, i8, i32)
6331 // A Imm WriteThru Mask Rounding
6332 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6333 // (<4 x double>, i32, <4 x double>, i8)
6334 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6335 // (<2 x double>, i32, <2 x double>, i8)
6336 // A Imm WriteThru Mask
6337 //
6338 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6339 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6340 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6341 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6342 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6343 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6344 //
6345 // Not supported: three vectors
6346 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6347 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6348 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6349 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6350 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6351 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6352 // i32)
6353 // A B WriteThru Mask Imm
6354 // Rounding
6355 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6356 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6357 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6358 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6359 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6360 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6361 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6362 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6363 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6364 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6365 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6366 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6367 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6368 /*MaskIndex=*/3);
6369 break;
6370
6371 // AVX512 FP16 Arithmetic
6372 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6373 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6374 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6375 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6376 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6377 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6378 visitGenericScalarHalfwordInst(I);
6379 break;
6380 }
6381
6382 // AVX Galois Field New Instructions
6383 case Intrinsic::x86_vgf2p8affineqb_128:
6384 case Intrinsic::x86_vgf2p8affineqb_256:
6385 case Intrinsic::x86_vgf2p8affineqb_512:
6386 handleAVXGF2P8Affine(I);
6387 break;
6388
6389 default:
6390 return false;
6391 }
6392
6393 return true;
6394 }
6395
6396 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6397 switch (I.getIntrinsicID()) {
6398 case Intrinsic::aarch64_neon_rshrn:
6399 case Intrinsic::aarch64_neon_sqrshl:
6400 case Intrinsic::aarch64_neon_sqrshrn:
6401 case Intrinsic::aarch64_neon_sqrshrun:
6402 case Intrinsic::aarch64_neon_sqshl:
6403 case Intrinsic::aarch64_neon_sqshlu:
6404 case Intrinsic::aarch64_neon_sqshrn:
6405 case Intrinsic::aarch64_neon_sqshrun:
6406 case Intrinsic::aarch64_neon_srshl:
6407 case Intrinsic::aarch64_neon_sshl:
6408 case Intrinsic::aarch64_neon_uqrshl:
6409 case Intrinsic::aarch64_neon_uqrshrn:
6410 case Intrinsic::aarch64_neon_uqshl:
6411 case Intrinsic::aarch64_neon_uqshrn:
6412 case Intrinsic::aarch64_neon_urshl:
6413 case Intrinsic::aarch64_neon_ushl:
6414 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6415 handleVectorShiftIntrinsic(I, /* Variable */ false);
6416 break;
6417
6418 // TODO: handling max/min similarly to AND/OR may be more precise
6419 // Floating-Point Maximum/Minimum Pairwise
6420 case Intrinsic::aarch64_neon_fmaxp:
6421 case Intrinsic::aarch64_neon_fminp:
6422 // Floating-Point Maximum/Minimum Number Pairwise
6423 case Intrinsic::aarch64_neon_fmaxnmp:
6424 case Intrinsic::aarch64_neon_fminnmp:
6425 // Signed/Unsigned Maximum/Minimum Pairwise
6426 case Intrinsic::aarch64_neon_smaxp:
6427 case Intrinsic::aarch64_neon_sminp:
6428 case Intrinsic::aarch64_neon_umaxp:
6429 case Intrinsic::aarch64_neon_uminp:
6430 // Add Pairwise
6431 case Intrinsic::aarch64_neon_addp:
6432 // Floating-point Add Pairwise
6433 case Intrinsic::aarch64_neon_faddp:
6434 // Add Long Pairwise
6435 case Intrinsic::aarch64_neon_saddlp:
6436 case Intrinsic::aarch64_neon_uaddlp: {
6437 handlePairwiseShadowOrIntrinsic(I);
6438 break;
6439 }
6440
6441 // Floating-point Convert to integer, rounding to nearest with ties to Away
6442 case Intrinsic::aarch64_neon_fcvtas:
6443 case Intrinsic::aarch64_neon_fcvtau:
6444 // Floating-point convert to integer, rounding toward minus infinity
6445 case Intrinsic::aarch64_neon_fcvtms:
6446 case Intrinsic::aarch64_neon_fcvtmu:
6447 // Floating-point convert to integer, rounding to nearest with ties to even
6448 case Intrinsic::aarch64_neon_fcvtns:
6449 case Intrinsic::aarch64_neon_fcvtnu:
6450 // Floating-point convert to integer, rounding toward plus infinity
6451 case Intrinsic::aarch64_neon_fcvtps:
6452 case Intrinsic::aarch64_neon_fcvtpu:
6453 // Floating-point Convert to integer, rounding toward Zero
6454 case Intrinsic::aarch64_neon_fcvtzs:
6455 case Intrinsic::aarch64_neon_fcvtzu:
6456 // Floating-point convert to lower precision narrow, rounding to odd
6457 case Intrinsic::aarch64_neon_fcvtxn: {
6458 handleNEONVectorConvertIntrinsic(I);
6459 break;
6460 }
6461
6462 // Add reduction to scalar
6463 case Intrinsic::aarch64_neon_faddv:
6464 case Intrinsic::aarch64_neon_saddv:
6465 case Intrinsic::aarch64_neon_uaddv:
6466 // Signed/Unsigned min/max (Vector)
6467 // TODO: handling similarly to AND/OR may be more precise.
6468 case Intrinsic::aarch64_neon_smaxv:
6469 case Intrinsic::aarch64_neon_sminv:
6470 case Intrinsic::aarch64_neon_umaxv:
6471 case Intrinsic::aarch64_neon_uminv:
6472 // Floating-point min/max (vector)
6473 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6474 // but our shadow propagation is the same.
6475 case Intrinsic::aarch64_neon_fmaxv:
6476 case Intrinsic::aarch64_neon_fminv:
6477 case Intrinsic::aarch64_neon_fmaxnmv:
6478 case Intrinsic::aarch64_neon_fminnmv:
6479 // Sum long across vector
6480 case Intrinsic::aarch64_neon_saddlv:
6481 case Intrinsic::aarch64_neon_uaddlv:
6482 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6483 break;
6484
6485 case Intrinsic::aarch64_neon_ld1x2:
6486 case Intrinsic::aarch64_neon_ld1x3:
6487 case Intrinsic::aarch64_neon_ld1x4:
6488 case Intrinsic::aarch64_neon_ld2:
6489 case Intrinsic::aarch64_neon_ld3:
6490 case Intrinsic::aarch64_neon_ld4:
6491 case Intrinsic::aarch64_neon_ld2r:
6492 case Intrinsic::aarch64_neon_ld3r:
6493 case Intrinsic::aarch64_neon_ld4r: {
6494 handleNEONVectorLoad(I, /*WithLane=*/false);
6495 break;
6496 }
6497
6498 case Intrinsic::aarch64_neon_ld2lane:
6499 case Intrinsic::aarch64_neon_ld3lane:
6500 case Intrinsic::aarch64_neon_ld4lane: {
6501 handleNEONVectorLoad(I, /*WithLane=*/true);
6502 break;
6503 }
6504
6505 // Saturating extract narrow
6506 case Intrinsic::aarch64_neon_sqxtn:
6507 case Intrinsic::aarch64_neon_sqxtun:
6508 case Intrinsic::aarch64_neon_uqxtn:
6509 // These only have one argument, but we (ab)use handleShadowOr because it
6510 // does work on single argument intrinsics and will typecast the shadow
6511 // (and update the origin).
6512 handleShadowOr(I);
6513 break;
6514
6515 case Intrinsic::aarch64_neon_st1x2:
6516 case Intrinsic::aarch64_neon_st1x3:
6517 case Intrinsic::aarch64_neon_st1x4:
6518 case Intrinsic::aarch64_neon_st2:
6519 case Intrinsic::aarch64_neon_st3:
6520 case Intrinsic::aarch64_neon_st4: {
6521 handleNEONVectorStoreIntrinsic(I, false);
6522 break;
6523 }
6524
6525 case Intrinsic::aarch64_neon_st2lane:
6526 case Intrinsic::aarch64_neon_st3lane:
6527 case Intrinsic::aarch64_neon_st4lane: {
6528 handleNEONVectorStoreIntrinsic(I, true);
6529 break;
6530 }
6531
6532 // Arm NEON vector table intrinsics have the source/table register(s) as
6533 // arguments, followed by the index register. They return the output.
6534 //
6535 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6536 // original value unchanged in the destination register.'
6537 // Conveniently, zero denotes a clean shadow, which means out-of-range
6538 // indices for TBL will initialize the user data with zero and also clean
6539 // the shadow. (For TBX, neither the user data nor the shadow will be
6540 // updated, which is also correct.)
6541 case Intrinsic::aarch64_neon_tbl1:
6542 case Intrinsic::aarch64_neon_tbl2:
6543 case Intrinsic::aarch64_neon_tbl3:
6544 case Intrinsic::aarch64_neon_tbl4:
6545 case Intrinsic::aarch64_neon_tbx1:
6546 case Intrinsic::aarch64_neon_tbx2:
6547 case Intrinsic::aarch64_neon_tbx3:
6548 case Intrinsic::aarch64_neon_tbx4: {
6549 // The last trailing argument (index register) should be handled verbatim
6550 handleIntrinsicByApplyingToShadow(
6551 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6552 /*trailingVerbatimArgs*/ 1);
6553 break;
6554 }
6555
6556 case Intrinsic::aarch64_neon_fmulx:
6557 case Intrinsic::aarch64_neon_pmul:
6558 case Intrinsic::aarch64_neon_pmull:
6559 case Intrinsic::aarch64_neon_smull:
6560 case Intrinsic::aarch64_neon_pmull64:
6561 case Intrinsic::aarch64_neon_umull: {
6562 handleNEONVectorMultiplyIntrinsic(I);
6563 break;
6564 }
6565
6566 default:
6567 return false;
6568 }
6569
6570 return true;
6571 }
6572
6573 void visitIntrinsicInst(IntrinsicInst &I) {
6574 if (maybeHandleCrossPlatformIntrinsic(I))
6575 return;
6576
6577 if (maybeHandleX86SIMDIntrinsic(I))
6578 return;
6579
6580 if (maybeHandleArmSIMDIntrinsic(I))
6581 return;
6582
6583 if (maybeHandleUnknownIntrinsic(I))
6584 return;
6585
6586 visitInstruction(I);
6587 }
6588
6589 void visitLibAtomicLoad(CallBase &CB) {
6590 // Since we use getNextNode here, we can't have CB terminate the BB.
6591 assert(isa<CallInst>(CB));
6592
6593 IRBuilder<> IRB(&CB);
6594 Value *Size = CB.getArgOperand(0);
6595 Value *SrcPtr = CB.getArgOperand(1);
6596 Value *DstPtr = CB.getArgOperand(2);
6597 Value *Ordering = CB.getArgOperand(3);
6598 // Convert the call to have at least Acquire ordering to make sure
6599 // the shadow operations aren't reordered before it.
6600 Value *NewOrdering =
6601 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6602 CB.setArgOperand(3, NewOrdering);
6603
6604 NextNodeIRBuilder NextIRB(&CB);
6605 Value *SrcShadowPtr, *SrcOriginPtr;
6606 std::tie(SrcShadowPtr, SrcOriginPtr) =
6607 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6608 /*isStore*/ false);
6609 Value *DstShadowPtr =
6610 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6611 /*isStore*/ true)
6612 .first;
6613
6614 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6615 if (MS.TrackOrigins) {
6616 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6618 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6619 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6620 }
6621 }
6622
6623 void visitLibAtomicStore(CallBase &CB) {
6624 IRBuilder<> IRB(&CB);
6625 Value *Size = CB.getArgOperand(0);
6626 Value *DstPtr = CB.getArgOperand(2);
6627 Value *Ordering = CB.getArgOperand(3);
6628 // Convert the call to have at least Release ordering to make sure
6629 // the shadow operations aren't reordered after it.
6630 Value *NewOrdering =
6631 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6632 CB.setArgOperand(3, NewOrdering);
6633
6634 Value *DstShadowPtr =
6635 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6636 /*isStore*/ true)
6637 .first;
6638
6639 // Atomic store always paints clean shadow/origin. See file header.
6640 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6641 Align(1));
6642 }
6643
6644 void visitCallBase(CallBase &CB) {
6645 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6646 if (CB.isInlineAsm()) {
6647 // For inline asm (either a call to asm function, or callbr instruction),
6648 // do the usual thing: check argument shadow and mark all outputs as
6649 // clean. Note that any side effects of the inline asm that are not
6650 // immediately visible in its constraints are not handled.
6652 visitAsmInstruction(CB);
6653 else
6654 visitInstruction(CB);
6655 return;
6656 }
6657 LibFunc LF;
6658 if (TLI->getLibFunc(CB, LF)) {
6659 // libatomic.a functions need to have special handling because there isn't
6660 // a good way to intercept them or compile the library with
6661 // instrumentation.
6662 switch (LF) {
6663 case LibFunc_atomic_load:
6664 if (!isa<CallInst>(CB)) {
6665 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6666 "Ignoring!\n";
6667 break;
6668 }
6669 visitLibAtomicLoad(CB);
6670 return;
6671 case LibFunc_atomic_store:
6672 visitLibAtomicStore(CB);
6673 return;
6674 default:
6675 break;
6676 }
6677 }
6678
6679 if (auto *Call = dyn_cast<CallInst>(&CB)) {
6680 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
6681
6682 // We are going to insert code that relies on the fact that the callee
6683 // will become a non-readonly function after it is instrumented by us. To
6684 // prevent this code from being optimized out, mark that function
6685 // non-readonly in advance.
6686 // TODO: We can likely do better than dropping memory() completely here.
6687 AttributeMask B;
6688 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
6689
6691 if (Function *Func = Call->getCalledFunction()) {
6692 Func->removeFnAttrs(B);
6693 }
6694
6696 }
6697 IRBuilder<> IRB(&CB);
6698 bool MayCheckCall = MS.EagerChecks;
6699 if (Function *Func = CB.getCalledFunction()) {
6700 // __sanitizer_unaligned_{load,store} functions may be called by users
6701 // and always expects shadows in the TLS. So don't check them.
6702 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
6703 }
6704
6705 unsigned ArgOffset = 0;
6706 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
6707 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
6708 if (!A->getType()->isSized()) {
6709 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
6710 continue;
6711 }
6712
6713 if (A->getType()->isScalableTy()) {
6714 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
6715 // Handle as noundef, but don't reserve tls slots.
6716 insertCheckShadowOf(A, &CB);
6717 continue;
6718 }
6719
6720 unsigned Size = 0;
6721 const DataLayout &DL = F.getDataLayout();
6722
6723 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
6724 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
6725 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
6726
6727 if (EagerCheck) {
6728 insertCheckShadowOf(A, &CB);
6729 Size = DL.getTypeAllocSize(A->getType());
6730 } else {
6731 [[maybe_unused]] Value *Store = nullptr;
6732 // Compute the Shadow for arg even if it is ByVal, because
6733 // in that case getShadow() will copy the actual arg shadow to
6734 // __msan_param_tls.
6735 Value *ArgShadow = getShadow(A);
6736 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
6737 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
6738 << " Shadow: " << *ArgShadow << "\n");
6739 if (ByVal) {
6740 // ByVal requires some special handling as it's too big for a single
6741 // load
6742 assert(A->getType()->isPointerTy() &&
6743 "ByVal argument is not a pointer!");
6744 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
6745 if (ArgOffset + Size > kParamTLSSize)
6746 break;
6747 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
6748 MaybeAlign Alignment = std::nullopt;
6749 if (ParamAlignment)
6750 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
6751 Value *AShadowPtr, *AOriginPtr;
6752 std::tie(AShadowPtr, AOriginPtr) =
6753 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
6754 /*isStore*/ false);
6755 if (!PropagateShadow) {
6756 Store = IRB.CreateMemSet(ArgShadowBase,
6758 Size, Alignment);
6759 } else {
6760 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
6761 Alignment, Size);
6762 if (MS.TrackOrigins) {
6763 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
6764 // FIXME: OriginSize should be:
6765 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
6766 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
6767 IRB.CreateMemCpy(
6768 ArgOriginBase,
6769 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
6770 AOriginPtr,
6771 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
6772 }
6773 }
6774 } else {
6775 // Any other parameters mean we need bit-grained tracking of uninit
6776 // data
6777 Size = DL.getTypeAllocSize(A->getType());
6778 if (ArgOffset + Size > kParamTLSSize)
6779 break;
6780 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
6782 Constant *Cst = dyn_cast<Constant>(ArgShadow);
6783 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
6784 IRB.CreateStore(getOrigin(A),
6785 getOriginPtrForArgument(IRB, ArgOffset));
6786 }
6787 }
6788 assert(Store != nullptr);
6789 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
6790 }
6791 assert(Size != 0);
6792 ArgOffset += alignTo(Size, kShadowTLSAlignment);
6793 }
6794 LLVM_DEBUG(dbgs() << " done with call args\n");
6795
6796 FunctionType *FT = CB.getFunctionType();
6797 if (FT->isVarArg()) {
6798 VAHelper->visitCallBase(CB, IRB);
6799 }
6800
6801 // Now, get the shadow for the RetVal.
6802 if (!CB.getType()->isSized())
6803 return;
6804 // Don't emit the epilogue for musttail call returns.
6805 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
6806 return;
6807
6808 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
6809 setShadow(&CB, getCleanShadow(&CB));
6810 setOrigin(&CB, getCleanOrigin());
6811 return;
6812 }
6813
6814 IRBuilder<> IRBBefore(&CB);
6815 // Until we have full dynamic coverage, make sure the retval shadow is 0.
6816 Value *Base = getShadowPtrForRetval(IRBBefore);
6817 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
6819 BasicBlock::iterator NextInsn;
6820 if (isa<CallInst>(CB)) {
6821 NextInsn = ++CB.getIterator();
6822 assert(NextInsn != CB.getParent()->end());
6823 } else {
6824 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
6825 if (!NormalDest->getSinglePredecessor()) {
6826 // FIXME: this case is tricky, so we are just conservative here.
6827 // Perhaps we need to split the edge between this BB and NormalDest,
6828 // but a naive attempt to use SplitEdge leads to a crash.
6829 setShadow(&CB, getCleanShadow(&CB));
6830 setOrigin(&CB, getCleanOrigin());
6831 return;
6832 }
6833 // FIXME: NextInsn is likely in a basic block that has not been visited
6834 // yet. Anything inserted there will be instrumented by MSan later!
6835 NextInsn = NormalDest->getFirstInsertionPt();
6836 assert(NextInsn != NormalDest->end() &&
6837 "Could not find insertion point for retval shadow load");
6838 }
6839 IRBuilder<> IRBAfter(&*NextInsn);
6840 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
6841 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
6842 "_msret");
6843 setShadow(&CB, RetvalShadow);
6844 if (MS.TrackOrigins)
6845 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
6846 }
6847
6848 bool isAMustTailRetVal(Value *RetVal) {
6849 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
6850 RetVal = I->getOperand(0);
6851 }
6852 if (auto *I = dyn_cast<CallInst>(RetVal)) {
6853 return I->isMustTailCall();
6854 }
6855 return false;
6856 }
6857
6858 void visitReturnInst(ReturnInst &I) {
6859 IRBuilder<> IRB(&I);
6860 Value *RetVal = I.getReturnValue();
6861 if (!RetVal)
6862 return;
6863 // Don't emit the epilogue for musttail call returns.
6864 if (isAMustTailRetVal(RetVal))
6865 return;
6866 Value *ShadowPtr = getShadowPtrForRetval(IRB);
6867 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
6868 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
6869 // FIXME: Consider using SpecialCaseList to specify a list of functions that
6870 // must always return fully initialized values. For now, we hardcode "main".
6871 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
6872
6873 Value *Shadow = getShadow(RetVal);
6874 bool StoreOrigin = true;
6875 if (EagerCheck) {
6876 insertCheckShadowOf(RetVal, &I);
6877 Shadow = getCleanShadow(RetVal);
6878 StoreOrigin = false;
6879 }
6880
6881 // The caller may still expect information passed over TLS if we pass our
6882 // check
6883 if (StoreShadow) {
6884 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
6885 if (MS.TrackOrigins && StoreOrigin)
6886 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
6887 }
6888 }
6889
6890 void visitPHINode(PHINode &I) {
6891 IRBuilder<> IRB(&I);
6892 if (!PropagateShadow) {
6893 setShadow(&I, getCleanShadow(&I));
6894 setOrigin(&I, getCleanOrigin());
6895 return;
6896 }
6897
6898 ShadowPHINodes.push_back(&I);
6899 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
6900 "_msphi_s"));
6901 if (MS.TrackOrigins)
6902 setOrigin(
6903 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
6904 }
6905
6906 Value *getLocalVarIdptr(AllocaInst &I) {
6907 ConstantInt *IntConst =
6908 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
6909 return new GlobalVariable(*F.getParent(), IntConst->getType(),
6910 /*isConstant=*/false, GlobalValue::PrivateLinkage,
6911 IntConst);
6912 }
6913
6914 Value *getLocalVarDescription(AllocaInst &I) {
6915 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
6916 }
6917
6918 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6919 if (PoisonStack && ClPoisonStackWithCall) {
6920 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
6921 } else {
6922 Value *ShadowBase, *OriginBase;
6923 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
6924 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
6925
6926 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
6927 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
6928 }
6929
6930 if (PoisonStack && MS.TrackOrigins) {
6931 Value *Idptr = getLocalVarIdptr(I);
6932 if (ClPrintStackNames) {
6933 Value *Descr = getLocalVarDescription(I);
6934 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
6935 {&I, Len, Idptr, Descr});
6936 } else {
6937 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
6938 }
6939 }
6940 }
6941
6942 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6943 Value *Descr = getLocalVarDescription(I);
6944 if (PoisonStack) {
6945 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
6946 } else {
6947 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
6948 }
6949 }
6950
6951 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
6952 if (!InsPoint)
6953 InsPoint = &I;
6954 NextNodeIRBuilder IRB(InsPoint);
6955 const DataLayout &DL = F.getDataLayout();
6956 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
6957 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
6958 if (I.isArrayAllocation())
6959 Len = IRB.CreateMul(Len,
6960 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
6961
6962 if (MS.CompileKernel)
6963 poisonAllocaKmsan(I, IRB, Len);
6964 else
6965 poisonAllocaUserspace(I, IRB, Len);
6966 }
6967
6968 void visitAllocaInst(AllocaInst &I) {
6969 setShadow(&I, getCleanShadow(&I));
6970 setOrigin(&I, getCleanOrigin());
6971 // We'll get to this alloca later unless it's poisoned at the corresponding
6972 // llvm.lifetime.start.
6973 AllocaSet.insert(&I);
6974 }
6975
6976 void visitSelectInst(SelectInst &I) {
6977 // a = select b, c, d
6978 Value *B = I.getCondition();
6979 Value *C = I.getTrueValue();
6980 Value *D = I.getFalseValue();
6981
6982 handleSelectLikeInst(I, B, C, D);
6983 }
6984
6985 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
6986 IRBuilder<> IRB(&I);
6987
6988 Value *Sb = getShadow(B);
6989 Value *Sc = getShadow(C);
6990 Value *Sd = getShadow(D);
6991
6992 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
6993 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
6994 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
6995
6996 // Result shadow if condition shadow is 0.
6997 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
6998 Value *Sa1;
6999 if (I.getType()->isAggregateType()) {
7000 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7001 // an extra "select". This results in much more compact IR.
7002 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7003 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7004 } else if (isScalableNonVectorType(I.getType())) {
7005 // This is intended to handle target("aarch64.svcount"), which can't be
7006 // handled in the else branch because of incompatibility with CreateXor
7007 // ("The supported LLVM operations on this type are limited to load,
7008 // store, phi, select and alloca instructions").
7009
7010 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7011 // branch as needed instead.
7012 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7013 } else {
7014 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7015 // If Sb (condition is poisoned), look for bits in c and d that are equal
7016 // and both unpoisoned.
7017 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7018
7019 // Cast arguments to shadow-compatible type.
7020 C = CreateAppToShadowCast(IRB, C);
7021 D = CreateAppToShadowCast(IRB, D);
7022
7023 // Result shadow if condition shadow is 1.
7024 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7025 }
7026 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7027 setShadow(&I, Sa);
7028 if (MS.TrackOrigins) {
7029 // Origins are always i32, so any vector conditions must be flattened.
7030 // FIXME: consider tracking vector origins for app vectors?
7031 if (B->getType()->isVectorTy()) {
7032 B = convertToBool(B, IRB);
7033 Sb = convertToBool(Sb, IRB);
7034 }
7035 // a = select b, c, d
7036 // Oa = Sb ? Ob : (b ? Oc : Od)
7037 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7038 }
7039 }
7040
7041 void visitLandingPadInst(LandingPadInst &I) {
7042 // Do nothing.
7043 // See https://github.com/google/sanitizers/issues/504
7044 setShadow(&I, getCleanShadow(&I));
7045 setOrigin(&I, getCleanOrigin());
7046 }
7047
7048 void visitCatchSwitchInst(CatchSwitchInst &I) {
7049 setShadow(&I, getCleanShadow(&I));
7050 setOrigin(&I, getCleanOrigin());
7051 }
7052
7053 void visitFuncletPadInst(FuncletPadInst &I) {
7054 setShadow(&I, getCleanShadow(&I));
7055 setOrigin(&I, getCleanOrigin());
7056 }
7057
7058 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7059
7060 void visitExtractValueInst(ExtractValueInst &I) {
7061 IRBuilder<> IRB(&I);
7062 Value *Agg = I.getAggregateOperand();
7063 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7064 Value *AggShadow = getShadow(Agg);
7065 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7066 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7067 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7068 setShadow(&I, ResShadow);
7069 setOriginForNaryOp(I);
7070 }
7071
7072 void visitInsertValueInst(InsertValueInst &I) {
7073 IRBuilder<> IRB(&I);
7074 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7075 Value *AggShadow = getShadow(I.getAggregateOperand());
7076 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7077 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7078 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7079 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7080 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7081 setShadow(&I, Res);
7082 setOriginForNaryOp(I);
7083 }
7084
7085 void dumpInst(Instruction &I) {
7086 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7087 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7088 } else {
7089 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7090 }
7091 errs() << "QQQ " << I << "\n";
7092 }
7093
7094 void visitResumeInst(ResumeInst &I) {
7095 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7096 // Nothing to do here.
7097 }
7098
7099 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7100 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7101 // Nothing to do here.
7102 }
7103
7104 void visitCatchReturnInst(CatchReturnInst &CRI) {
7105 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7106 // Nothing to do here.
7107 }
7108
7109 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7110 IRBuilder<> &IRB, const DataLayout &DL,
7111 bool isOutput) {
7112 // For each assembly argument, we check its value for being initialized.
7113 // If the argument is a pointer, we assume it points to a single element
7114 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7115 // Each such pointer is instrumented with a call to the runtime library.
7116 Type *OpType = Operand->getType();
7117 // Check the operand value itself.
7118 insertCheckShadowOf(Operand, &I);
7119 if (!OpType->isPointerTy() || !isOutput) {
7120 assert(!isOutput);
7121 return;
7122 }
7123 if (!ElemTy->isSized())
7124 return;
7125 auto Size = DL.getTypeStoreSize(ElemTy);
7126 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7127 if (MS.CompileKernel) {
7128 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7129 } else {
7130 // ElemTy, derived from elementtype(), does not encode the alignment of
7131 // the pointer. Conservatively assume that the shadow memory is unaligned.
7132 // When Size is large, avoid StoreInst as it would expand to many
7133 // instructions.
7134 auto [ShadowPtr, _] =
7135 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7136 if (Size <= 32)
7137 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7138 else
7139 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7140 SizeVal, Align(1));
7141 }
7142 }
7143
7144 /// Get the number of output arguments returned by pointers.
7145 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7146 int NumRetOutputs = 0;
7147 int NumOutputs = 0;
7148 Type *RetTy = cast<Value>(CB)->getType();
7149 if (!RetTy->isVoidTy()) {
7150 // Register outputs are returned via the CallInst return value.
7151 auto *ST = dyn_cast<StructType>(RetTy);
7152 if (ST)
7153 NumRetOutputs = ST->getNumElements();
7154 else
7155 NumRetOutputs = 1;
7156 }
7157 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7158 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7159 switch (Info.Type) {
7161 NumOutputs++;
7162 break;
7163 default:
7164 break;
7165 }
7166 }
7167 return NumOutputs - NumRetOutputs;
7168 }
7169
7170 void visitAsmInstruction(Instruction &I) {
7171 // Conservative inline assembly handling: check for poisoned shadow of
7172 // asm() arguments, then unpoison the result and all the memory locations
7173 // pointed to by those arguments.
7174 // An inline asm() statement in C++ contains lists of input and output
7175 // arguments used by the assembly code. These are mapped to operands of the
7176 // CallInst as follows:
7177 // - nR register outputs ("=r) are returned by value in a single structure
7178 // (SSA value of the CallInst);
7179 // - nO other outputs ("=m" and others) are returned by pointer as first
7180 // nO operands of the CallInst;
7181 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7182 // remaining nI operands.
7183 // The total number of asm() arguments in the source is nR+nO+nI, and the
7184 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7185 // function to be called).
7186 const DataLayout &DL = F.getDataLayout();
7187 CallBase *CB = cast<CallBase>(&I);
7188 IRBuilder<> IRB(&I);
7189 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7190 int OutputArgs = getNumOutputArgs(IA, CB);
7191 // The last operand of a CallInst is the function itself.
7192 int NumOperands = CB->getNumOperands() - 1;
7193
7194 // Check input arguments. Doing so before unpoisoning output arguments, so
7195 // that we won't overwrite uninit values before checking them.
7196 for (int i = OutputArgs; i < NumOperands; i++) {
7197 Value *Operand = CB->getOperand(i);
7198 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7199 /*isOutput*/ false);
7200 }
7201 // Unpoison output arguments. This must happen before the actual InlineAsm
7202 // call, so that the shadow for memory published in the asm() statement
7203 // remains valid.
7204 for (int i = 0; i < OutputArgs; i++) {
7205 Value *Operand = CB->getOperand(i);
7206 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7207 /*isOutput*/ true);
7208 }
7209
7210 setShadow(&I, getCleanShadow(&I));
7211 setOrigin(&I, getCleanOrigin());
7212 }
7213
7214 void visitFreezeInst(FreezeInst &I) {
7215 // Freeze always returns a fully defined value.
7216 setShadow(&I, getCleanShadow(&I));
7217 setOrigin(&I, getCleanOrigin());
7218 }
7219
7220 void visitInstruction(Instruction &I) {
7221 // Everything else: stop propagating and check for poisoned shadow.
7223 dumpInst(I);
7224 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7225 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7226 Value *Operand = I.getOperand(i);
7227 if (Operand->getType()->isSized())
7228 insertCheckShadowOf(Operand, &I);
7229 }
7230 setShadow(&I, getCleanShadow(&I));
7231 setOrigin(&I, getCleanOrigin());
7232 }
7233};
7234
7235struct VarArgHelperBase : public VarArgHelper {
7236 Function &F;
7237 MemorySanitizer &MS;
7238 MemorySanitizerVisitor &MSV;
7239 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7240 const unsigned VAListTagSize;
7241
7242 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7243 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7244 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7245
7246 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7247 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7248 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7249 }
7250
7251 /// Compute the shadow address for a given va_arg.
7252 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7253 return IRB.CreatePtrAdd(
7254 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7255 }
7256
7257 /// Compute the shadow address for a given va_arg.
7258 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7259 unsigned ArgSize) {
7260 // Make sure we don't overflow __msan_va_arg_tls.
7261 if (ArgOffset + ArgSize > kParamTLSSize)
7262 return nullptr;
7263 return getShadowPtrForVAArgument(IRB, ArgOffset);
7264 }
7265
7266 /// Compute the origin address for a given va_arg.
7267 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7268 // getOriginPtrForVAArgument() is always called after
7269 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7270 // overflow.
7271 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7272 ConstantInt::get(MS.IntptrTy, ArgOffset),
7273 "_msarg_va_o");
7274 }
7275
7276 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7277 unsigned BaseOffset) {
7278 // The tails of __msan_va_arg_tls is not large enough to fit full
7279 // value shadow, but it will be copied to backup anyway. Make it
7280 // clean.
7281 if (BaseOffset >= kParamTLSSize)
7282 return;
7283 Value *TailSize =
7284 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7285 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7286 TailSize, Align(8));
7287 }
7288
7289 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7290 IRBuilder<> IRB(&I);
7291 Value *VAListTag = I.getArgOperand(0);
7292 const Align Alignment = Align(8);
7293 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7294 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7295 // Unpoison the whole __va_list_tag.
7296 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7297 VAListTagSize, Alignment, false);
7298 }
7299
7300 void visitVAStartInst(VAStartInst &I) override {
7301 if (F.getCallingConv() == CallingConv::Win64)
7302 return;
7303 VAStartInstrumentationList.push_back(&I);
7304 unpoisonVAListTagForInst(I);
7305 }
7306
7307 void visitVACopyInst(VACopyInst &I) override {
7308 if (F.getCallingConv() == CallingConv::Win64)
7309 return;
7310 unpoisonVAListTagForInst(I);
7311 }
7312};
7313
7314/// AMD64-specific implementation of VarArgHelper.
7315struct VarArgAMD64Helper : public VarArgHelperBase {
7316 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7317 // See a comment in visitCallBase for more details.
7318 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7319 static const unsigned AMD64FpEndOffsetSSE = 176;
7320 // If SSE is disabled, fp_offset in va_list is zero.
7321 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7322
7323 unsigned AMD64FpEndOffset;
7324 AllocaInst *VAArgTLSCopy = nullptr;
7325 AllocaInst *VAArgTLSOriginCopy = nullptr;
7326 Value *VAArgOverflowSize = nullptr;
7327
7328 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7329
7330 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7331 MemorySanitizerVisitor &MSV)
7332 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7333 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7334 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7335 if (Attr.isStringAttribute() &&
7336 (Attr.getKindAsString() == "target-features")) {
7337 if (Attr.getValueAsString().contains("-sse"))
7338 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7339 break;
7340 }
7341 }
7342 }
7343
7344 ArgKind classifyArgument(Value *arg) {
7345 // A very rough approximation of X86_64 argument classification rules.
7346 Type *T = arg->getType();
7347 if (T->isX86_FP80Ty())
7348 return AK_Memory;
7349 if (T->isFPOrFPVectorTy())
7350 return AK_FloatingPoint;
7351 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7352 return AK_GeneralPurpose;
7353 if (T->isPointerTy())
7354 return AK_GeneralPurpose;
7355 return AK_Memory;
7356 }
7357
7358 // For VarArg functions, store the argument shadow in an ABI-specific format
7359 // that corresponds to va_list layout.
7360 // We do this because Clang lowers va_arg in the frontend, and this pass
7361 // only sees the low level code that deals with va_list internals.
7362 // A much easier alternative (provided that Clang emits va_arg instructions)
7363 // would have been to associate each live instance of va_list with a copy of
7364 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7365 // order.
7366 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7367 unsigned GpOffset = 0;
7368 unsigned FpOffset = AMD64GpEndOffset;
7369 unsigned OverflowOffset = AMD64FpEndOffset;
7370 const DataLayout &DL = F.getDataLayout();
7371
7372 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7373 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7374 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7375 if (IsByVal) {
7376 // ByVal arguments always go to the overflow area.
7377 // Fixed arguments passed through the overflow area will be stepped
7378 // over by va_start, so don't count them towards the offset.
7379 if (IsFixed)
7380 continue;
7381 assert(A->getType()->isPointerTy());
7382 Type *RealTy = CB.getParamByValType(ArgNo);
7383 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7384 uint64_t AlignedSize = alignTo(ArgSize, 8);
7385 unsigned BaseOffset = OverflowOffset;
7386 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7387 Value *OriginBase = nullptr;
7388 if (MS.TrackOrigins)
7389 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7390 OverflowOffset += AlignedSize;
7391
7392 if (OverflowOffset > kParamTLSSize) {
7393 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7394 continue; // We have no space to copy shadow there.
7395 }
7396
7397 Value *ShadowPtr, *OriginPtr;
7398 std::tie(ShadowPtr, OriginPtr) =
7399 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7400 /*isStore*/ false);
7401 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7402 kShadowTLSAlignment, ArgSize);
7403 if (MS.TrackOrigins)
7404 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7405 kShadowTLSAlignment, ArgSize);
7406 } else {
7407 ArgKind AK = classifyArgument(A);
7408 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7409 AK = AK_Memory;
7410 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7411 AK = AK_Memory;
7412 Value *ShadowBase, *OriginBase = nullptr;
7413 switch (AK) {
7414 case AK_GeneralPurpose:
7415 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7416 if (MS.TrackOrigins)
7417 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7418 GpOffset += 8;
7419 assert(GpOffset <= kParamTLSSize);
7420 break;
7421 case AK_FloatingPoint:
7422 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7423 if (MS.TrackOrigins)
7424 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7425 FpOffset += 16;
7426 assert(FpOffset <= kParamTLSSize);
7427 break;
7428 case AK_Memory:
7429 if (IsFixed)
7430 continue;
7431 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7432 uint64_t AlignedSize = alignTo(ArgSize, 8);
7433 unsigned BaseOffset = OverflowOffset;
7434 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7435 if (MS.TrackOrigins) {
7436 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7437 }
7438 OverflowOffset += AlignedSize;
7439 if (OverflowOffset > kParamTLSSize) {
7440 // We have no space to copy shadow there.
7441 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7442 continue;
7443 }
7444 }
7445 // Take fixed arguments into account for GpOffset and FpOffset,
7446 // but don't actually store shadows for them.
7447 // TODO(glider): don't call get*PtrForVAArgument() for them.
7448 if (IsFixed)
7449 continue;
7450 Value *Shadow = MSV.getShadow(A);
7451 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7452 if (MS.TrackOrigins) {
7453 Value *Origin = MSV.getOrigin(A);
7454 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7455 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7457 }
7458 }
7459 }
7460 Constant *OverflowSize =
7461 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7462 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7463 }
7464
7465 void finalizeInstrumentation() override {
7466 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7467 "finalizeInstrumentation called twice");
7468 if (!VAStartInstrumentationList.empty()) {
7469 // If there is a va_start in this function, make a backup copy of
7470 // va_arg_tls somewhere in the function entry block.
7471 IRBuilder<> IRB(MSV.FnPrologueEnd);
7472 VAArgOverflowSize =
7473 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7474 Value *CopySize = IRB.CreateAdd(
7475 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7476 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7477 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7478 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7479 CopySize, kShadowTLSAlignment, false);
7480
7481 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7482 Intrinsic::umin, CopySize,
7483 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7484 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7485 kShadowTLSAlignment, SrcSize);
7486 if (MS.TrackOrigins) {
7487 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7488 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7489 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7490 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7491 }
7492 }
7493
7494 // Instrument va_start.
7495 // Copy va_list shadow from the backup copy of the TLS contents.
7496 for (CallInst *OrigInst : VAStartInstrumentationList) {
7497 NextNodeIRBuilder IRB(OrigInst);
7498 Value *VAListTag = OrigInst->getArgOperand(0);
7499
7500 Value *RegSaveAreaPtrPtr =
7501 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7502 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7503 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7504 const Align Alignment = Align(16);
7505 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7506 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7507 Alignment, /*isStore*/ true);
7508 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7509 AMD64FpEndOffset);
7510 if (MS.TrackOrigins)
7511 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7512 Alignment, AMD64FpEndOffset);
7513 Value *OverflowArgAreaPtrPtr =
7514 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7515 Value *OverflowArgAreaPtr =
7516 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7517 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7518 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7519 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7520 Alignment, /*isStore*/ true);
7521 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7522 AMD64FpEndOffset);
7523 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7524 VAArgOverflowSize);
7525 if (MS.TrackOrigins) {
7526 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7527 AMD64FpEndOffset);
7528 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7529 VAArgOverflowSize);
7530 }
7531 }
7532 }
7533};
7534
7535/// AArch64-specific implementation of VarArgHelper.
7536struct VarArgAArch64Helper : public VarArgHelperBase {
7537 static const unsigned kAArch64GrArgSize = 64;
7538 static const unsigned kAArch64VrArgSize = 128;
7539
7540 static const unsigned AArch64GrBegOffset = 0;
7541 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7542 // Make VR space aligned to 16 bytes.
7543 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7544 static const unsigned AArch64VrEndOffset =
7545 AArch64VrBegOffset + kAArch64VrArgSize;
7546 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7547
7548 AllocaInst *VAArgTLSCopy = nullptr;
7549 Value *VAArgOverflowSize = nullptr;
7550
7551 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7552
7553 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7554 MemorySanitizerVisitor &MSV)
7555 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7556
7557 // A very rough approximation of aarch64 argument classification rules.
7558 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7559 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7560 return {AK_GeneralPurpose, 1};
7561 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7562 return {AK_FloatingPoint, 1};
7563
7564 if (T->isArrayTy()) {
7565 auto R = classifyArgument(T->getArrayElementType());
7566 R.second *= T->getScalarType()->getArrayNumElements();
7567 return R;
7568 }
7569
7570 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7571 auto R = classifyArgument(FV->getScalarType());
7572 R.second *= FV->getNumElements();
7573 return R;
7574 }
7575
7576 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7577 return {AK_Memory, 0};
7578 }
7579
7580 // The instrumentation stores the argument shadow in a non ABI-specific
7581 // format because it does not know which argument is named (since Clang,
7582 // like x86_64 case, lowers the va_args in the frontend and this pass only
7583 // sees the low level code that deals with va_list internals).
7584 // The first seven GR registers are saved in the first 56 bytes of the
7585 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7586 // the remaining arguments.
7587 // Using constant offset within the va_arg TLS array allows fast copy
7588 // in the finalize instrumentation.
7589 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7590 unsigned GrOffset = AArch64GrBegOffset;
7591 unsigned VrOffset = AArch64VrBegOffset;
7592 unsigned OverflowOffset = AArch64VAEndOffset;
7593
7594 const DataLayout &DL = F.getDataLayout();
7595 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7596 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7597 auto [AK, RegNum] = classifyArgument(A->getType());
7598 if (AK == AK_GeneralPurpose &&
7599 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7600 AK = AK_Memory;
7601 if (AK == AK_FloatingPoint &&
7602 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7603 AK = AK_Memory;
7604 Value *Base;
7605 switch (AK) {
7606 case AK_GeneralPurpose:
7607 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7608 GrOffset += 8 * RegNum;
7609 break;
7610 case AK_FloatingPoint:
7611 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7612 VrOffset += 16 * RegNum;
7613 break;
7614 case AK_Memory:
7615 // Don't count fixed arguments in the overflow area - va_start will
7616 // skip right over them.
7617 if (IsFixed)
7618 continue;
7619 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7620 uint64_t AlignedSize = alignTo(ArgSize, 8);
7621 unsigned BaseOffset = OverflowOffset;
7622 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7623 OverflowOffset += AlignedSize;
7624 if (OverflowOffset > kParamTLSSize) {
7625 // We have no space to copy shadow there.
7626 CleanUnusedTLS(IRB, Base, BaseOffset);
7627 continue;
7628 }
7629 break;
7630 }
7631 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7632 // bother to actually store a shadow.
7633 if (IsFixed)
7634 continue;
7635 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7636 }
7637 Constant *OverflowSize =
7638 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7639 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7640 }
7641
7642 // Retrieve a va_list field of 'void*' size.
7643 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7644 Value *SaveAreaPtrPtr =
7645 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7646 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7647 }
7648
7649 // Retrieve a va_list field of 'int' size.
7650 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7651 Value *SaveAreaPtr =
7652 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7653 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7654 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7655 }
7656
7657 void finalizeInstrumentation() override {
7658 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7659 "finalizeInstrumentation called twice");
7660 if (!VAStartInstrumentationList.empty()) {
7661 // If there is a va_start in this function, make a backup copy of
7662 // va_arg_tls somewhere in the function entry block.
7663 IRBuilder<> IRB(MSV.FnPrologueEnd);
7664 VAArgOverflowSize =
7665 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7666 Value *CopySize = IRB.CreateAdd(
7667 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7668 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7669 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7670 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7671 CopySize, kShadowTLSAlignment, false);
7672
7673 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7674 Intrinsic::umin, CopySize,
7675 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7676 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7677 kShadowTLSAlignment, SrcSize);
7678 }
7679
7680 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
7681 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
7682
7683 // Instrument va_start, copy va_list shadow from the backup copy of
7684 // the TLS contents.
7685 for (CallInst *OrigInst : VAStartInstrumentationList) {
7686 NextNodeIRBuilder IRB(OrigInst);
7687
7688 Value *VAListTag = OrigInst->getArgOperand(0);
7689
7690 // The variadic ABI for AArch64 creates two areas to save the incoming
7691 // argument registers (one for 64-bit general register xn-x7 and another
7692 // for 128-bit FP/SIMD vn-v7).
7693 // We need then to propagate the shadow arguments on both regions
7694 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
7695 // The remaining arguments are saved on shadow for 'va::stack'.
7696 // One caveat is it requires only to propagate the non-named arguments,
7697 // however on the call site instrumentation 'all' the arguments are
7698 // saved. So to copy the shadow values from the va_arg TLS array
7699 // we need to adjust the offset for both GR and VR fields based on
7700 // the __{gr,vr}_offs value (since they are stores based on incoming
7701 // named arguments).
7702 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
7703
7704 // Read the stack pointer from the va_list.
7705 Value *StackSaveAreaPtr =
7706 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
7707
7708 // Read both the __gr_top and __gr_off and add them up.
7709 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
7710 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
7711
7712 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
7713 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
7714
7715 // Read both the __vr_top and __vr_off and add them up.
7716 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
7717 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
7718
7719 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
7720 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
7721
7722 // It does not know how many named arguments is being used and, on the
7723 // callsite all the arguments were saved. Since __gr_off is defined as
7724 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
7725 // argument by ignoring the bytes of shadow from named arguments.
7726 Value *GrRegSaveAreaShadowPtrOff =
7727 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
7728
7729 Value *GrRegSaveAreaShadowPtr =
7730 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7731 Align(8), /*isStore*/ true)
7732 .first;
7733
7734 Value *GrSrcPtr =
7735 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
7736 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
7737
7738 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
7739 GrCopySize);
7740
7741 // Again, but for FP/SIMD values.
7742 Value *VrRegSaveAreaShadowPtrOff =
7743 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
7744
7745 Value *VrRegSaveAreaShadowPtr =
7746 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7747 Align(8), /*isStore*/ true)
7748 .first;
7749
7750 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
7751 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
7752 IRB.getInt32(AArch64VrBegOffset)),
7753 VrRegSaveAreaShadowPtrOff);
7754 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
7755
7756 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
7757 VrCopySize);
7758
7759 // And finally for remaining arguments.
7760 Value *StackSaveAreaShadowPtr =
7761 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
7762 Align(16), /*isStore*/ true)
7763 .first;
7764
7765 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
7766 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
7767
7768 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
7769 Align(16), VAArgOverflowSize);
7770 }
7771 }
7772};
7773
7774/// PowerPC64-specific implementation of VarArgHelper.
7775struct VarArgPowerPC64Helper : public VarArgHelperBase {
7776 AllocaInst *VAArgTLSCopy = nullptr;
7777 Value *VAArgSize = nullptr;
7778
7779 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
7780 MemorySanitizerVisitor &MSV)
7781 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
7782
7783 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7784 // For PowerPC, we need to deal with alignment of stack arguments -
7785 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
7786 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
7787 // For that reason, we compute current offset from stack pointer (which is
7788 // always properly aligned), and offset for the first vararg, then subtract
7789 // them.
7790 unsigned VAArgBase;
7791 Triple TargetTriple(F.getParent()->getTargetTriple());
7792 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
7793 // and 32 bytes for ABIv2. This is usually determined by target
7794 // endianness, but in theory could be overridden by function attribute.
7795 if (TargetTriple.isPPC64ELFv2ABI())
7796 VAArgBase = 32;
7797 else
7798 VAArgBase = 48;
7799 unsigned VAArgOffset = VAArgBase;
7800 const DataLayout &DL = F.getDataLayout();
7801 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7802 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7803 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7804 if (IsByVal) {
7805 assert(A->getType()->isPointerTy());
7806 Type *RealTy = CB.getParamByValType(ArgNo);
7807 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7808 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
7809 if (ArgAlign < 8)
7810 ArgAlign = Align(8);
7811 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7812 if (!IsFixed) {
7813 Value *Base =
7814 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7815 if (Base) {
7816 Value *AShadowPtr, *AOriginPtr;
7817 std::tie(AShadowPtr, AOriginPtr) =
7818 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7819 kShadowTLSAlignment, /*isStore*/ false);
7820
7821 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7822 kShadowTLSAlignment, ArgSize);
7823 }
7824 }
7825 VAArgOffset += alignTo(ArgSize, Align(8));
7826 } else {
7827 Value *Base;
7828 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7829 Align ArgAlign = Align(8);
7830 if (A->getType()->isArrayTy()) {
7831 // Arrays are aligned to element size, except for long double
7832 // arrays, which are aligned to 8 bytes.
7833 Type *ElementTy = A->getType()->getArrayElementType();
7834 if (!ElementTy->isPPC_FP128Ty())
7835 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7836 } else if (A->getType()->isVectorTy()) {
7837 // Vectors are naturally aligned.
7838 ArgAlign = Align(ArgSize);
7839 }
7840 if (ArgAlign < 8)
7841 ArgAlign = Align(8);
7842 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7843 if (DL.isBigEndian()) {
7844 // Adjusting the shadow for argument with size < 8 to match the
7845 // placement of bits in big endian system
7846 if (ArgSize < 8)
7847 VAArgOffset += (8 - ArgSize);
7848 }
7849 if (!IsFixed) {
7850 Base =
7851 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7852 if (Base)
7853 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7854 }
7855 VAArgOffset += ArgSize;
7856 VAArgOffset = alignTo(VAArgOffset, Align(8));
7857 }
7858 if (IsFixed)
7859 VAArgBase = VAArgOffset;
7860 }
7861
7862 Constant *TotalVAArgSize =
7863 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7864 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7865 // a new class member i.e. it is the total size of all VarArgs.
7866 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7867 }
7868
7869 void finalizeInstrumentation() override {
7870 assert(!VAArgSize && !VAArgTLSCopy &&
7871 "finalizeInstrumentation called twice");
7872 IRBuilder<> IRB(MSV.FnPrologueEnd);
7873 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7874 Value *CopySize = VAArgSize;
7875
7876 if (!VAStartInstrumentationList.empty()) {
7877 // If there is a va_start in this function, make a backup copy of
7878 // va_arg_tls somewhere in the function entry block.
7879
7880 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7881 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7882 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7883 CopySize, kShadowTLSAlignment, false);
7884
7885 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7886 Intrinsic::umin, CopySize,
7887 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
7888 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7889 kShadowTLSAlignment, SrcSize);
7890 }
7891
7892 // Instrument va_start.
7893 // Copy va_list shadow from the backup copy of the TLS contents.
7894 for (CallInst *OrigInst : VAStartInstrumentationList) {
7895 NextNodeIRBuilder IRB(OrigInst);
7896 Value *VAListTag = OrigInst->getArgOperand(0);
7897 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
7898
7899 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
7900
7901 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7902 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7903 const DataLayout &DL = F.getDataLayout();
7904 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7905 const Align Alignment = Align(IntptrSize);
7906 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7907 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7908 Alignment, /*isStore*/ true);
7909 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7910 CopySize);
7911 }
7912 }
7913};
7914
7915/// PowerPC32-specific implementation of VarArgHelper.
7916struct VarArgPowerPC32Helper : public VarArgHelperBase {
7917 AllocaInst *VAArgTLSCopy = nullptr;
7918 Value *VAArgSize = nullptr;
7919
7920 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
7921 MemorySanitizerVisitor &MSV)
7922 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
7923
7924 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7925 unsigned VAArgBase;
7926 // Parameter save area is 8 bytes from frame pointer in PPC32
7927 VAArgBase = 8;
7928 unsigned VAArgOffset = VAArgBase;
7929 const DataLayout &DL = F.getDataLayout();
7930 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7931 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7932 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7933 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7934 if (IsByVal) {
7935 assert(A->getType()->isPointerTy());
7936 Type *RealTy = CB.getParamByValType(ArgNo);
7937 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7938 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
7939 if (ArgAlign < IntptrSize)
7940 ArgAlign = Align(IntptrSize);
7941 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7942 if (!IsFixed) {
7943 Value *Base =
7944 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7945 if (Base) {
7946 Value *AShadowPtr, *AOriginPtr;
7947 std::tie(AShadowPtr, AOriginPtr) =
7948 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7949 kShadowTLSAlignment, /*isStore*/ false);
7950
7951 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7952 kShadowTLSAlignment, ArgSize);
7953 }
7954 }
7955 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
7956 } else {
7957 Value *Base;
7958 Type *ArgTy = A->getType();
7959
7960 // On PPC 32 floating point variable arguments are stored in separate
7961 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
7962 // them as they will be found when checking call arguments.
7963 if (!ArgTy->isFloatingPointTy()) {
7964 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
7965 Align ArgAlign = Align(IntptrSize);
7966 if (ArgTy->isArrayTy()) {
7967 // Arrays are aligned to element size, except for long double
7968 // arrays, which are aligned to 8 bytes.
7969 Type *ElementTy = ArgTy->getArrayElementType();
7970 if (!ElementTy->isPPC_FP128Ty())
7971 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7972 } else if (ArgTy->isVectorTy()) {
7973 // Vectors are naturally aligned.
7974 ArgAlign = Align(ArgSize);
7975 }
7976 if (ArgAlign < IntptrSize)
7977 ArgAlign = Align(IntptrSize);
7978 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7979 if (DL.isBigEndian()) {
7980 // Adjusting the shadow for argument with size < IntptrSize to match
7981 // the placement of bits in big endian system
7982 if (ArgSize < IntptrSize)
7983 VAArgOffset += (IntptrSize - ArgSize);
7984 }
7985 if (!IsFixed) {
7986 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
7987 ArgSize);
7988 if (Base)
7989 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
7991 }
7992 VAArgOffset += ArgSize;
7993 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
7994 }
7995 }
7996 }
7997
7998 Constant *TotalVAArgSize =
7999 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8000 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8001 // a new class member i.e. it is the total size of all VarArgs.
8002 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8003 }
8004
8005 void finalizeInstrumentation() override {
8006 assert(!VAArgSize && !VAArgTLSCopy &&
8007 "finalizeInstrumentation called twice");
8008 IRBuilder<> IRB(MSV.FnPrologueEnd);
8009 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8010 Value *CopySize = VAArgSize;
8011
8012 if (!VAStartInstrumentationList.empty()) {
8013 // If there is a va_start in this function, make a backup copy of
8014 // va_arg_tls somewhere in the function entry block.
8015
8016 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8017 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8018 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8019 CopySize, kShadowTLSAlignment, false);
8020
8021 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8022 Intrinsic::umin, CopySize,
8023 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8024 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8025 kShadowTLSAlignment, SrcSize);
8026 }
8027
8028 // Instrument va_start.
8029 // Copy va_list shadow from the backup copy of the TLS contents.
8030 for (CallInst *OrigInst : VAStartInstrumentationList) {
8031 NextNodeIRBuilder IRB(OrigInst);
8032 Value *VAListTag = OrigInst->getArgOperand(0);
8033 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8034 Value *RegSaveAreaSize = CopySize;
8035
8036 // In PPC32 va_list_tag is a struct
8037 RegSaveAreaPtrPtr =
8038 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8039
8040 // On PPC 32 reg_save_area can only hold 32 bytes of data
8041 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8042 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8043
8044 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8045 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8046
8047 const DataLayout &DL = F.getDataLayout();
8048 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8049 const Align Alignment = Align(IntptrSize);
8050
8051 { // Copy reg save area
8052 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8053 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8054 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8055 Alignment, /*isStore*/ true);
8056 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8057 Alignment, RegSaveAreaSize);
8058
8059 RegSaveAreaShadowPtr =
8060 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8061 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8062 ConstantInt::get(MS.IntptrTy, 32));
8063 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8064 // We fill fp shadow with zeroes as uninitialized fp args should have
8065 // been found during call base check
8066 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8067 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8068 }
8069
8070 { // Copy overflow area
8071 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8072 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8073
8074 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8075 OverflowAreaPtrPtr =
8076 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8077 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8078
8079 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8080
8081 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8082 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8083 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8084 Alignment, /*isStore*/ true);
8085
8086 Value *OverflowVAArgTLSCopyPtr =
8087 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8088 OverflowVAArgTLSCopyPtr =
8089 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8090
8091 OverflowVAArgTLSCopyPtr =
8092 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8093 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8094 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8095 }
8096 }
8097 }
8098};
8099
8100/// SystemZ-specific implementation of VarArgHelper.
8101struct VarArgSystemZHelper : public VarArgHelperBase {
8102 static const unsigned SystemZGpOffset = 16;
8103 static const unsigned SystemZGpEndOffset = 56;
8104 static const unsigned SystemZFpOffset = 128;
8105 static const unsigned SystemZFpEndOffset = 160;
8106 static const unsigned SystemZMaxVrArgs = 8;
8107 static const unsigned SystemZRegSaveAreaSize = 160;
8108 static const unsigned SystemZOverflowOffset = 160;
8109 static const unsigned SystemZVAListTagSize = 32;
8110 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8111 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8112
8113 bool IsSoftFloatABI;
8114 AllocaInst *VAArgTLSCopy = nullptr;
8115 AllocaInst *VAArgTLSOriginCopy = nullptr;
8116 Value *VAArgOverflowSize = nullptr;
8117
8118 enum class ArgKind {
8119 GeneralPurpose,
8120 FloatingPoint,
8121 Vector,
8122 Memory,
8123 Indirect,
8124 };
8125
8126 enum class ShadowExtension { None, Zero, Sign };
8127
8128 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8129 MemorySanitizerVisitor &MSV)
8130 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8131 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8132
8133 ArgKind classifyArgument(Type *T) {
8134 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8135 // only a few possibilities of what it can be. In particular, enums, single
8136 // element structs and large types have already been taken care of.
8137
8138 // Some i128 and fp128 arguments are converted to pointers only in the
8139 // back end.
8140 if (T->isIntegerTy(128) || T->isFP128Ty())
8141 return ArgKind::Indirect;
8142 if (T->isFloatingPointTy())
8143 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8144 if (T->isIntegerTy() || T->isPointerTy())
8145 return ArgKind::GeneralPurpose;
8146 if (T->isVectorTy())
8147 return ArgKind::Vector;
8148 return ArgKind::Memory;
8149 }
8150
8151 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8152 // ABI says: "One of the simple integer types no more than 64 bits wide.
8153 // ... If such an argument is shorter than 64 bits, replace it by a full
8154 // 64-bit integer representing the same number, using sign or zero
8155 // extension". Shadow for an integer argument has the same type as the
8156 // argument itself, so it can be sign or zero extended as well.
8157 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8158 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8159 if (ZExt) {
8160 assert(!SExt);
8161 return ShadowExtension::Zero;
8162 }
8163 if (SExt) {
8164 assert(!ZExt);
8165 return ShadowExtension::Sign;
8166 }
8167 return ShadowExtension::None;
8168 }
8169
8170 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8171 unsigned GpOffset = SystemZGpOffset;
8172 unsigned FpOffset = SystemZFpOffset;
8173 unsigned VrIndex = 0;
8174 unsigned OverflowOffset = SystemZOverflowOffset;
8175 const DataLayout &DL = F.getDataLayout();
8176 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8177 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8178 // SystemZABIInfo does not produce ByVal parameters.
8179 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8180 Type *T = A->getType();
8181 ArgKind AK = classifyArgument(T);
8182 if (AK == ArgKind::Indirect) {
8183 T = MS.PtrTy;
8184 AK = ArgKind::GeneralPurpose;
8185 }
8186 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8187 AK = ArgKind::Memory;
8188 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8189 AK = ArgKind::Memory;
8190 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8191 AK = ArgKind::Memory;
8192 Value *ShadowBase = nullptr;
8193 Value *OriginBase = nullptr;
8194 ShadowExtension SE = ShadowExtension::None;
8195 switch (AK) {
8196 case ArgKind::GeneralPurpose: {
8197 // Always keep track of GpOffset, but store shadow only for varargs.
8198 uint64_t ArgSize = 8;
8199 if (GpOffset + ArgSize <= kParamTLSSize) {
8200 if (!IsFixed) {
8201 SE = getShadowExtension(CB, ArgNo);
8202 uint64_t GapSize = 0;
8203 if (SE == ShadowExtension::None) {
8204 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8205 assert(ArgAllocSize <= ArgSize);
8206 GapSize = ArgSize - ArgAllocSize;
8207 }
8208 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8209 if (MS.TrackOrigins)
8210 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8211 }
8212 GpOffset += ArgSize;
8213 } else {
8214 GpOffset = kParamTLSSize;
8215 }
8216 break;
8217 }
8218 case ArgKind::FloatingPoint: {
8219 // Always keep track of FpOffset, but store shadow only for varargs.
8220 uint64_t ArgSize = 8;
8221 if (FpOffset + ArgSize <= kParamTLSSize) {
8222 if (!IsFixed) {
8223 // PoP says: "A short floating-point datum requires only the
8224 // left-most 32 bit positions of a floating-point register".
8225 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8226 // don't extend shadow and don't mind the gap.
8227 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8228 if (MS.TrackOrigins)
8229 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8230 }
8231 FpOffset += ArgSize;
8232 } else {
8233 FpOffset = kParamTLSSize;
8234 }
8235 break;
8236 }
8237 case ArgKind::Vector: {
8238 // Keep track of VrIndex. No need to store shadow, since vector varargs
8239 // go through AK_Memory.
8240 assert(IsFixed);
8241 VrIndex++;
8242 break;
8243 }
8244 case ArgKind::Memory: {
8245 // Keep track of OverflowOffset and store shadow only for varargs.
8246 // Ignore fixed args, since we need to copy only the vararg portion of
8247 // the overflow area shadow.
8248 if (!IsFixed) {
8249 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8250 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8251 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8252 SE = getShadowExtension(CB, ArgNo);
8253 uint64_t GapSize =
8254 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8255 ShadowBase =
8256 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8257 if (MS.TrackOrigins)
8258 OriginBase =
8259 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8260 OverflowOffset += ArgSize;
8261 } else {
8262 OverflowOffset = kParamTLSSize;
8263 }
8264 }
8265 break;
8266 }
8267 case ArgKind::Indirect:
8268 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8269 }
8270 if (ShadowBase == nullptr)
8271 continue;
8272 Value *Shadow = MSV.getShadow(A);
8273 if (SE != ShadowExtension::None)
8274 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8275 /*Signed*/ SE == ShadowExtension::Sign);
8276 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8277 IRB.CreateStore(Shadow, ShadowBase);
8278 if (MS.TrackOrigins) {
8279 Value *Origin = MSV.getOrigin(A);
8280 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8281 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8283 }
8284 }
8285 Constant *OverflowSize = ConstantInt::get(
8286 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8287 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8288 }
8289
8290 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8291 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8292 IRB.CreateAdd(
8293 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8294 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8295 MS.PtrTy);
8296 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8297 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8298 const Align Alignment = Align(8);
8299 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8300 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8301 /*isStore*/ true);
8302 // TODO(iii): copy only fragments filled by visitCallBase()
8303 // TODO(iii): support packed-stack && !use-soft-float
8304 // For use-soft-float functions, it is enough to copy just the GPRs.
8305 unsigned RegSaveAreaSize =
8306 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8307 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8308 RegSaveAreaSize);
8309 if (MS.TrackOrigins)
8310 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8311 Alignment, RegSaveAreaSize);
8312 }
8313
8314 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8315 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8316 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8317 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8318 IRB.CreateAdd(
8319 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8320 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8321 MS.PtrTy);
8322 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8323 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8324 const Align Alignment = Align(8);
8325 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8326 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8327 Alignment, /*isStore*/ true);
8328 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8329 SystemZOverflowOffset);
8330 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8331 VAArgOverflowSize);
8332 if (MS.TrackOrigins) {
8333 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8334 SystemZOverflowOffset);
8335 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8336 VAArgOverflowSize);
8337 }
8338 }
8339
8340 void finalizeInstrumentation() override {
8341 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8342 "finalizeInstrumentation called twice");
8343 if (!VAStartInstrumentationList.empty()) {
8344 // If there is a va_start in this function, make a backup copy of
8345 // va_arg_tls somewhere in the function entry block.
8346 IRBuilder<> IRB(MSV.FnPrologueEnd);
8347 VAArgOverflowSize =
8348 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8349 Value *CopySize =
8350 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8351 VAArgOverflowSize);
8352 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8353 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8354 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8355 CopySize, kShadowTLSAlignment, false);
8356
8357 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8358 Intrinsic::umin, CopySize,
8359 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8360 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8361 kShadowTLSAlignment, SrcSize);
8362 if (MS.TrackOrigins) {
8363 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8364 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8365 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8366 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8367 }
8368 }
8369
8370 // Instrument va_start.
8371 // Copy va_list shadow from the backup copy of the TLS contents.
8372 for (CallInst *OrigInst : VAStartInstrumentationList) {
8373 NextNodeIRBuilder IRB(OrigInst);
8374 Value *VAListTag = OrigInst->getArgOperand(0);
8375 copyRegSaveArea(IRB, VAListTag);
8376 copyOverflowArea(IRB, VAListTag);
8377 }
8378 }
8379};
8380
8381/// i386-specific implementation of VarArgHelper.
8382struct VarArgI386Helper : public VarArgHelperBase {
8383 AllocaInst *VAArgTLSCopy = nullptr;
8384 Value *VAArgSize = nullptr;
8385
8386 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8387 MemorySanitizerVisitor &MSV)
8388 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8389
8390 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8391 const DataLayout &DL = F.getDataLayout();
8392 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8393 unsigned VAArgOffset = 0;
8394 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8395 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8396 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8397 if (IsByVal) {
8398 assert(A->getType()->isPointerTy());
8399 Type *RealTy = CB.getParamByValType(ArgNo);
8400 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8401 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8402 if (ArgAlign < IntptrSize)
8403 ArgAlign = Align(IntptrSize);
8404 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8405 if (!IsFixed) {
8406 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8407 if (Base) {
8408 Value *AShadowPtr, *AOriginPtr;
8409 std::tie(AShadowPtr, AOriginPtr) =
8410 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8411 kShadowTLSAlignment, /*isStore*/ false);
8412
8413 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8414 kShadowTLSAlignment, ArgSize);
8415 }
8416 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8417 }
8418 } else {
8419 Value *Base;
8420 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8421 Align ArgAlign = Align(IntptrSize);
8422 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8423 if (DL.isBigEndian()) {
8424 // Adjusting the shadow for argument with size < IntptrSize to match
8425 // the placement of bits in big endian system
8426 if (ArgSize < IntptrSize)
8427 VAArgOffset += (IntptrSize - ArgSize);
8428 }
8429 if (!IsFixed) {
8430 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8431 if (Base)
8432 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8433 VAArgOffset += ArgSize;
8434 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8435 }
8436 }
8437 }
8438
8439 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8440 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8441 // a new class member i.e. it is the total size of all VarArgs.
8442 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8443 }
8444
8445 void finalizeInstrumentation() override {
8446 assert(!VAArgSize && !VAArgTLSCopy &&
8447 "finalizeInstrumentation called twice");
8448 IRBuilder<> IRB(MSV.FnPrologueEnd);
8449 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8450 Value *CopySize = VAArgSize;
8451
8452 if (!VAStartInstrumentationList.empty()) {
8453 // If there is a va_start in this function, make a backup copy of
8454 // va_arg_tls somewhere in the function entry block.
8455 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8456 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8457 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8458 CopySize, kShadowTLSAlignment, false);
8459
8460 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8461 Intrinsic::umin, CopySize,
8462 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8463 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8464 kShadowTLSAlignment, SrcSize);
8465 }
8466
8467 // Instrument va_start.
8468 // Copy va_list shadow from the backup copy of the TLS contents.
8469 for (CallInst *OrigInst : VAStartInstrumentationList) {
8470 NextNodeIRBuilder IRB(OrigInst);
8471 Value *VAListTag = OrigInst->getArgOperand(0);
8472 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8473 Value *RegSaveAreaPtrPtr =
8474 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8475 PointerType::get(*MS.C, 0));
8476 Value *RegSaveAreaPtr =
8477 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8478 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8479 const DataLayout &DL = F.getDataLayout();
8480 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8481 const Align Alignment = Align(IntptrSize);
8482 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8483 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8484 Alignment, /*isStore*/ true);
8485 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8486 CopySize);
8487 }
8488 }
8489};
8490
8491/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8492/// LoongArch64.
8493struct VarArgGenericHelper : public VarArgHelperBase {
8494 AllocaInst *VAArgTLSCopy = nullptr;
8495 Value *VAArgSize = nullptr;
8496
8497 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8498 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8499 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8500
8501 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8502 unsigned VAArgOffset = 0;
8503 const DataLayout &DL = F.getDataLayout();
8504 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8505 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8506 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8507 if (IsFixed)
8508 continue;
8509 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8510 if (DL.isBigEndian()) {
8511 // Adjusting the shadow for argument with size < IntptrSize to match the
8512 // placement of bits in big endian system
8513 if (ArgSize < IntptrSize)
8514 VAArgOffset += (IntptrSize - ArgSize);
8515 }
8516 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8517 VAArgOffset += ArgSize;
8518 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8519 if (!Base)
8520 continue;
8521 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8522 }
8523
8524 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8525 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8526 // a new class member i.e. it is the total size of all VarArgs.
8527 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8528 }
8529
8530 void finalizeInstrumentation() override {
8531 assert(!VAArgSize && !VAArgTLSCopy &&
8532 "finalizeInstrumentation called twice");
8533 IRBuilder<> IRB(MSV.FnPrologueEnd);
8534 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8535 Value *CopySize = VAArgSize;
8536
8537 if (!VAStartInstrumentationList.empty()) {
8538 // If there is a va_start in this function, make a backup copy of
8539 // va_arg_tls somewhere in the function entry block.
8540 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8541 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8542 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8543 CopySize, kShadowTLSAlignment, false);
8544
8545 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8546 Intrinsic::umin, CopySize,
8547 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8548 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8549 kShadowTLSAlignment, SrcSize);
8550 }
8551
8552 // Instrument va_start.
8553 // Copy va_list shadow from the backup copy of the TLS contents.
8554 for (CallInst *OrigInst : VAStartInstrumentationList) {
8555 NextNodeIRBuilder IRB(OrigInst);
8556 Value *VAListTag = OrigInst->getArgOperand(0);
8557 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8558 Value *RegSaveAreaPtrPtr =
8559 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8560 PointerType::get(*MS.C, 0));
8561 Value *RegSaveAreaPtr =
8562 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8563 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8564 const DataLayout &DL = F.getDataLayout();
8565 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8566 const Align Alignment = Align(IntptrSize);
8567 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8568 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8569 Alignment, /*isStore*/ true);
8570 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8571 CopySize);
8572 }
8573 }
8574};
8575
8576// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8577// regarding VAArgs.
8578using VarArgARM32Helper = VarArgGenericHelper;
8579using VarArgRISCVHelper = VarArgGenericHelper;
8580using VarArgMIPSHelper = VarArgGenericHelper;
8581using VarArgLoongArch64Helper = VarArgGenericHelper;
8582
8583/// A no-op implementation of VarArgHelper.
8584struct VarArgNoOpHelper : public VarArgHelper {
8585 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8586 MemorySanitizerVisitor &MSV) {}
8587
8588 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8589
8590 void visitVAStartInst(VAStartInst &I) override {}
8591
8592 void visitVACopyInst(VACopyInst &I) override {}
8593
8594 void finalizeInstrumentation() override {}
8595};
8596
8597} // end anonymous namespace
8598
8599static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8600 MemorySanitizerVisitor &Visitor) {
8601 // VarArg handling is only implemented on AMD64. False positives are possible
8602 // on other platforms.
8603 Triple TargetTriple(Func.getParent()->getTargetTriple());
8604
8605 if (TargetTriple.getArch() == Triple::x86)
8606 return new VarArgI386Helper(Func, Msan, Visitor);
8607
8608 if (TargetTriple.getArch() == Triple::x86_64)
8609 return new VarArgAMD64Helper(Func, Msan, Visitor);
8610
8611 if (TargetTriple.isARM())
8612 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8613
8614 if (TargetTriple.isAArch64())
8615 return new VarArgAArch64Helper(Func, Msan, Visitor);
8616
8617 if (TargetTriple.isSystemZ())
8618 return new VarArgSystemZHelper(Func, Msan, Visitor);
8619
8620 // On PowerPC32 VAListTag is a struct
8621 // {char, char, i16 padding, char *, char *}
8622 if (TargetTriple.isPPC32())
8623 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8624
8625 if (TargetTriple.isPPC64())
8626 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8627
8628 if (TargetTriple.isRISCV32())
8629 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8630
8631 if (TargetTriple.isRISCV64())
8632 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8633
8634 if (TargetTriple.isMIPS32())
8635 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8636
8637 if (TargetTriple.isMIPS64())
8638 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8639
8640 if (TargetTriple.isLoongArch64())
8641 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8642 /*VAListTagSize=*/8);
8643
8644 return new VarArgNoOpHelper(Func, Msan, Visitor);
8645}
8646
8647bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8648 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8649 return false;
8650
8651 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8652 return false;
8653
8654 MemorySanitizerVisitor Visitor(F, *this, TLI);
8655
8656 // Clear out memory attributes.
8658 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8659 F.removeFnAttrs(B);
8660
8661 return Visitor.runOnFunction();
8662}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const MemoryMapParams Linux_S390X_MemoryMapParams
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:219
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:146
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:472
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:131
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(unsigned CounterName)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:803
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2579
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1939
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1833
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2633
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2567
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1867
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2103
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2254
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2626
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2097
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2202
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2039
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2336
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1926
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1784
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2497
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1808
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2332
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2207
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1850
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2085
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2601
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1863
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2197
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2659
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2511
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2071
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2364
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2344
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2280
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2654
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1886
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2044
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2442
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2788
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:319
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:180
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:414
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1040
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1083
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1056
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:413
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1088
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1029
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1035
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:923
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1061
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1008
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1107
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:62
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:281
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:198
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:231
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:232
unsigned getNumOperands() const
Definition User.h:254
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:390
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:201
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:169
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1655
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2472
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3861
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70