LLVM 23.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-handle-lifetime-intrinsics",
304 cl::desc(
305 "when possible, poison scoped variables at the beginning of the scope "
306 "(slower, but more precise)"),
307 cl::Hidden, cl::init(true));
308
309// When compiling the Linux kernel, we sometimes see false positives related to
310// MSan being unable to understand that inline assembly calls may initialize
311// local variables.
312// This flag makes the compiler conservatively unpoison every memory location
313// passed into an assembly call. Note that this may cause false positives.
314// Because it's impossible to figure out the array sizes, we can only unpoison
315// the first sizeof(type) bytes for each type* pointer.
317 "msan-handle-asm-conservative",
318 cl::desc("conservative handling of inline assembly"), cl::Hidden,
319 cl::init(true));
320
321// This flag controls whether we check the shadow of the address
322// operand of load or store. Such bugs are very rare, since load from
323// a garbage address typically results in SEGV, but still happen
324// (e.g. only lower bits of address are garbage, or the access happens
325// early at program startup where malloc-ed memory is more likely to
326// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
328 "msan-check-access-address",
329 cl::desc("report accesses through a pointer which has poisoned shadow"),
330 cl::Hidden, cl::init(true));
331
333 "msan-eager-checks",
334 cl::desc("check arguments and return values at function call boundaries"),
335 cl::Hidden, cl::init(false));
336
338 "msan-dump-strict-instructions",
339 cl::desc("print out instructions with default strict semantics i.e.,"
340 "check that all the inputs are fully initialized, and mark "
341 "the output as fully initialized. These semantics are applied "
342 "to instructions that could not be handled explicitly nor "
343 "heuristically."),
344 cl::Hidden, cl::init(false));
345
346// Currently, all the heuristically handled instructions are specifically
347// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
348// to parallel 'msan-dump-strict-instructions', and to keep the door open to
349// handling non-intrinsic instructions heuristically.
351 "msan-dump-heuristic-instructions",
352 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
353 "Use -msan-dump-strict-instructions to print instructions that "
354 "could not be handled explicitly nor heuristically."),
355 cl::Hidden, cl::init(false));
356
358 "msan-instrumentation-with-call-threshold",
359 cl::desc(
360 "If the function being instrumented requires more than "
361 "this number of checks and origin stores, use callbacks instead of "
362 "inline checks (-1 means never use callbacks)."),
363 cl::Hidden, cl::init(3500));
364
365static cl::opt<bool>
366 ClEnableKmsan("msan-kernel",
367 cl::desc("Enable KernelMemorySanitizer instrumentation"),
368 cl::Hidden, cl::init(false));
369
370static cl::opt<bool>
371 ClDisableChecks("msan-disable-checks",
372 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
373 cl::init(false));
374
375static cl::opt<bool>
376 ClCheckConstantShadow("msan-check-constant-shadow",
377 cl::desc("Insert checks for constant shadow values"),
378 cl::Hidden, cl::init(true));
379
380// This is off by default because of a bug in gold:
381// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
382static cl::opt<bool>
383 ClWithComdat("msan-with-comdat",
384 cl::desc("Place MSan constructors in comdat sections"),
385 cl::Hidden, cl::init(false));
386
387// These options allow to specify custom memory map parameters
388// See MemoryMapParams for details.
389static cl::opt<uint64_t> ClAndMask("msan-and-mask",
390 cl::desc("Define custom MSan AndMask"),
391 cl::Hidden, cl::init(0));
392
393static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
394 cl::desc("Define custom MSan XorMask"),
395 cl::Hidden, cl::init(0));
396
397static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
398 cl::desc("Define custom MSan ShadowBase"),
399 cl::Hidden, cl::init(0));
400
401static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
402 cl::desc("Define custom MSan OriginBase"),
403 cl::Hidden, cl::init(0));
404
405static cl::opt<int>
406 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
407 cl::desc("Define threshold for number of checks per "
408 "debug location to force origin update."),
409 cl::Hidden, cl::init(3));
410
411const char kMsanModuleCtorName[] = "msan.module_ctor";
412const char kMsanInitName[] = "__msan_init";
413
414namespace {
415
416// Memory map parameters used in application-to-shadow address calculation.
417// Offset = (Addr & ~AndMask) ^ XorMask
418// Shadow = ShadowBase + Offset
419// Origin = OriginBase + Offset
420struct MemoryMapParams {
421 uint64_t AndMask;
422 uint64_t XorMask;
423 uint64_t ShadowBase;
424 uint64_t OriginBase;
425};
426
427struct PlatformMemoryMapParams {
428 const MemoryMapParams *bits32;
429 const MemoryMapParams *bits64;
430};
431
432} // end anonymous namespace
433
434// i386 Linux
435static const MemoryMapParams Linux_I386_MemoryMapParams = {
436 0x000080000000, // AndMask
437 0, // XorMask (not used)
438 0, // ShadowBase (not used)
439 0x000040000000, // OriginBase
440};
441
442// x86_64 Linux
443static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
444 0, // AndMask (not used)
445 0x500000000000, // XorMask
446 0, // ShadowBase (not used)
447 0x100000000000, // OriginBase
448};
449
450// mips32 Linux
451// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
452// after picking good constants
453
454// mips64 Linux
455static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
456 0, // AndMask (not used)
457 0x008000000000, // XorMask
458 0, // ShadowBase (not used)
459 0x002000000000, // OriginBase
460};
461
462// ppc32 Linux
463// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
464// after picking good constants
465
466// ppc64 Linux
467static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
468 0xE00000000000, // AndMask
469 0x100000000000, // XorMask
470 0x080000000000, // ShadowBase
471 0x1C0000000000, // OriginBase
472};
473
474// s390x Linux
475static const MemoryMapParams Linux_S390X_MemoryMapParams = {
476 0xC00000000000, // AndMask
477 0, // XorMask (not used)
478 0x080000000000, // ShadowBase
479 0x1C0000000000, // OriginBase
480};
481
482// arm32 Linux
483// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
484// after picking good constants
485
486// aarch64 Linux
487static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
488 0, // AndMask (not used)
489 0x0B00000000000, // XorMask
490 0, // ShadowBase (not used)
491 0x0200000000000, // OriginBase
492};
493
494// loongarch64 Linux
495static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
496 0, // AndMask (not used)
497 0x500000000000, // XorMask
498 0, // ShadowBase (not used)
499 0x100000000000, // OriginBase
500};
501
502// riscv32 Linux
503// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
504// after picking good constants
505
506// aarch64 FreeBSD
507static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
508 0x1800000000000, // AndMask
509 0x0400000000000, // XorMask
510 0x0200000000000, // ShadowBase
511 0x0700000000000, // OriginBase
512};
513
514// i386 FreeBSD
515static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
516 0x000180000000, // AndMask
517 0x000040000000, // XorMask
518 0x000020000000, // ShadowBase
519 0x000700000000, // OriginBase
520};
521
522// x86_64 FreeBSD
523static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
524 0xc00000000000, // AndMask
525 0x200000000000, // XorMask
526 0x100000000000, // ShadowBase
527 0x380000000000, // OriginBase
528};
529
530// x86_64 NetBSD
531static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
532 0, // AndMask
533 0x500000000000, // XorMask
534 0, // ShadowBase
535 0x100000000000, // OriginBase
536};
537
538static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
541};
542
543static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
544 nullptr,
546};
547
548static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
549 nullptr,
551};
552
553static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
554 nullptr,
556};
557
558static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
559 nullptr,
561};
562
563static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
564 nullptr,
566};
567
568static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
569 nullptr,
571};
572
573static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
576};
577
578static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
579 nullptr,
581};
582
584
585namespace {
586
587/// Instrument functions of a module to detect uninitialized reads.
588///
589/// Instantiating MemorySanitizer inserts the msan runtime library API function
590/// declarations into the module if they don't exist already. Instantiating
591/// ensures the __msan_init function is in the list of global constructors for
592/// the module.
593class MemorySanitizer {
594public:
595 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
596 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
597 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
598 initializeModule(M);
599 }
600
601 // MSan cannot be moved or copied because of MapParams.
602 MemorySanitizer(MemorySanitizer &&) = delete;
603 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
604 MemorySanitizer(const MemorySanitizer &) = delete;
605 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
606
607 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
608
609private:
610 friend struct MemorySanitizerVisitor;
611 friend struct VarArgHelperBase;
612 friend struct VarArgAMD64Helper;
613 friend struct VarArgAArch64Helper;
614 friend struct VarArgPowerPC64Helper;
615 friend struct VarArgPowerPC32Helper;
616 friend struct VarArgSystemZHelper;
617 friend struct VarArgI386Helper;
618 friend struct VarArgGenericHelper;
619
620 void initializeModule(Module &M);
621 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
622 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
623 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
624
625 template <typename... ArgsTy>
626 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
627 ArgsTy... Args);
628
629 /// True if we're compiling the Linux kernel.
630 bool CompileKernel;
631 /// Track origins (allocation points) of uninitialized values.
632 int TrackOrigins;
633 bool Recover;
634 bool EagerChecks;
635
636 Triple TargetTriple;
637 LLVMContext *C;
638 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
639 Type *OriginTy;
640 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
641
642 // XxxTLS variables represent the per-thread state in MSan and per-task state
643 // in KMSAN.
644 // For the userspace these point to thread-local globals. In the kernel land
645 // they point to the members of a per-task struct obtained via a call to
646 // __msan_get_context_state().
647
648 /// Thread-local shadow storage for function parameters.
649 Value *ParamTLS;
650
651 /// Thread-local origin storage for function parameters.
652 Value *ParamOriginTLS;
653
654 /// Thread-local shadow storage for function return value.
655 Value *RetvalTLS;
656
657 /// Thread-local origin storage for function return value.
658 Value *RetvalOriginTLS;
659
660 /// Thread-local shadow storage for in-register va_arg function.
661 Value *VAArgTLS;
662
663 /// Thread-local shadow storage for in-register va_arg function.
664 Value *VAArgOriginTLS;
665
666 /// Thread-local shadow storage for va_arg overflow area.
667 Value *VAArgOverflowSizeTLS;
668
669 /// Are the instrumentation callbacks set up?
670 bool CallbacksInitialized = false;
671
672 /// The run-time callback to print a warning.
673 FunctionCallee WarningFn;
674
675 // These arrays are indexed by log2(AccessSize).
676 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
677 FunctionCallee MaybeWarningVarSizeFn;
678 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
679
680 /// Run-time helper that generates a new origin value for a stack
681 /// allocation.
682 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
683 // No description version
684 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
685
686 /// Run-time helper that poisons stack on function entry.
687 FunctionCallee MsanPoisonStackFn;
688
689 /// Run-time helper that records a store (or any event) of an
690 /// uninitialized value and returns an updated origin id encoding this info.
691 FunctionCallee MsanChainOriginFn;
692
693 /// Run-time helper that paints an origin over a region.
694 FunctionCallee MsanSetOriginFn;
695
696 /// MSan runtime replacements for memmove, memcpy and memset.
697 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
698
699 /// KMSAN callback for task-local function argument shadow.
700 StructType *MsanContextStateTy;
701 FunctionCallee MsanGetContextStateFn;
702
703 /// Functions for poisoning/unpoisoning local variables
704 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
705
706 /// Pair of shadow/origin pointers.
707 Type *MsanMetadata;
708
709 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
710 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
711 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
712 FunctionCallee MsanMetadataPtrForStore_1_8[4];
713 FunctionCallee MsanInstrumentAsmStoreFn;
714
715 /// Storage for return values of the MsanMetadataPtrXxx functions.
716 Value *MsanMetadataAlloca;
717
718 /// Helper to choose between different MsanMetadataPtrXxx().
719 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
720
721 /// Memory map parameters used in application-to-shadow calculation.
722 const MemoryMapParams *MapParams;
723
724 /// Custom memory map parameters used when -msan-shadow-base or
725 // -msan-origin-base is provided.
726 MemoryMapParams CustomMapParams;
727
728 MDNode *ColdCallWeights;
729
730 /// Branch weights for origin store.
731 MDNode *OriginStoreWeights;
732};
733
734void insertModuleCtor(Module &M) {
737 /*InitArgTypes=*/{},
738 /*InitArgs=*/{},
739 // This callback is invoked when the functions are created the first
740 // time. Hook them into the global ctors list in that case:
741 [&](Function *Ctor, FunctionCallee) {
742 if (!ClWithComdat) {
743 appendToGlobalCtors(M, Ctor, 0);
744 return;
745 }
746 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
747 Ctor->setComdat(MsanCtorComdat);
748 appendToGlobalCtors(M, Ctor, 0, Ctor);
749 });
750}
751
752template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
753 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
754}
755
756} // end anonymous namespace
757
759 bool EagerChecks)
760 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
761 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
762 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
763 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
764
767 // Return early if nosanitize_memory module flag is present for the module.
768 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
769 return PreservedAnalyses::all();
770 bool Modified = false;
771 if (!Options.Kernel) {
772 insertModuleCtor(M);
773 Modified = true;
774 }
775
776 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
777 for (Function &F : M) {
778 if (F.empty())
779 continue;
780 MemorySanitizer Msan(*F.getParent(), Options);
781 Modified |=
782 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
783 }
784
785 if (!Modified)
786 return PreservedAnalyses::all();
787
789 // GlobalsAA is considered stateless and does not get invalidated unless
790 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
791 // make changes that require GlobalsAA to be invalidated.
792 PA.abandon<GlobalsAA>();
793 return PA;
794}
795
797 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
799 OS, MapClassName2PassName);
800 OS << '<';
801 if (Options.Recover)
802 OS << "recover;";
803 if (Options.Kernel)
804 OS << "kernel;";
805 if (Options.EagerChecks)
806 OS << "eager-checks;";
807 OS << "track-origins=" << Options.TrackOrigins;
808 OS << '>';
809}
810
811/// Create a non-const global initialized with the given string.
812///
813/// Creates a writable global for Str so that we can pass it to the
814/// run-time lib. Runtime uses first 4 bytes of the string to store the
815/// frame ID, so the string needs to be mutable.
817 StringRef Str) {
818 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
819 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
820 GlobalValue::PrivateLinkage, StrConst, "");
821}
822
823template <typename... ArgsTy>
825MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
826 ArgsTy... Args) {
827 if (TargetTriple.getArch() == Triple::systemz) {
828 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
829 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
830 std::forward<ArgsTy>(Args)...);
831 }
832
833 return M.getOrInsertFunction(Name, MsanMetadata,
834 std::forward<ArgsTy>(Args)...);
835}
836
837/// Create KMSAN API callbacks.
838void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
839 IRBuilder<> IRB(*C);
840
841 // These will be initialized in insertKmsanPrologue().
842 RetvalTLS = nullptr;
843 RetvalOriginTLS = nullptr;
844 ParamTLS = nullptr;
845 ParamOriginTLS = nullptr;
846 VAArgTLS = nullptr;
847 VAArgOriginTLS = nullptr;
848 VAArgOverflowSizeTLS = nullptr;
849
850 WarningFn = M.getOrInsertFunction("__msan_warning",
851 TLI.getAttrList(C, {0}, /*Signed=*/false),
852 IRB.getVoidTy(), IRB.getInt32Ty());
853
854 // Requests the per-task context state (kmsan_context_state*) from the
855 // runtime library.
856 MsanContextStateTy = StructType::get(
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
858 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
859 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
860 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
861 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
862 OriginTy);
863 MsanGetContextStateFn =
864 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
865
866 MsanMetadata = StructType::get(PtrTy, PtrTy);
867
868 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
869 std::string name_load =
870 "__msan_metadata_ptr_for_load_" + std::to_string(size);
871 std::string name_store =
872 "__msan_metadata_ptr_for_store_" + std::to_string(size);
873 MsanMetadataPtrForLoad_1_8[ind] =
874 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
875 MsanMetadataPtrForStore_1_8[ind] =
876 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
877 }
878
879 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
880 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
881 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
882 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
883
884 // Functions for poisoning and unpoisoning memory.
885 MsanPoisonAllocaFn = M.getOrInsertFunction(
886 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
887 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
888 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
889}
890
892 return M.getOrInsertGlobal(Name, Ty, [&] {
893 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
894 nullptr, Name, nullptr,
896 });
897}
898
899/// Insert declarations for userspace-specific functions and globals.
900void MemorySanitizer::createUserspaceApi(Module &M,
901 const TargetLibraryInfo &TLI) {
902 IRBuilder<> IRB(*C);
903
904 // Create the callback.
905 // FIXME: this function should have "Cold" calling conv,
906 // which is not yet implemented.
907 if (TrackOrigins) {
908 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
909 : "__msan_warning_with_origin_noreturn";
910 WarningFn = M.getOrInsertFunction(WarningFnName,
911 TLI.getAttrList(C, {0}, /*Signed=*/false),
912 IRB.getVoidTy(), IRB.getInt32Ty());
913 } else {
914 StringRef WarningFnName =
915 Recover ? "__msan_warning" : "__msan_warning_noreturn";
916 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
917 }
918
919 // Create the global TLS variables.
920 RetvalTLS =
921 getOrInsertGlobal(M, "__msan_retval_tls",
922 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
923
924 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
925
926 ParamTLS =
927 getOrInsertGlobal(M, "__msan_param_tls",
928 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
929
930 ParamOriginTLS =
931 getOrInsertGlobal(M, "__msan_param_origin_tls",
932 ArrayType::get(OriginTy, kParamTLSSize / 4));
933
934 VAArgTLS =
935 getOrInsertGlobal(M, "__msan_va_arg_tls",
936 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
937
938 VAArgOriginTLS =
939 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
940 ArrayType::get(OriginTy, kParamTLSSize / 4));
941
942 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
943 IRB.getIntPtrTy(M.getDataLayout()));
944
945 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
946 AccessSizeIndex++) {
947 unsigned AccessSize = 1 << AccessSizeIndex;
948 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
949 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
950 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
951 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
952 MaybeWarningVarSizeFn = M.getOrInsertFunction(
953 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
954 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
955 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
956 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
957 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
958 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
959 IRB.getInt32Ty());
960 }
961
962 MsanSetAllocaOriginWithDescriptionFn =
963 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
964 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
965 MsanSetAllocaOriginNoDescriptionFn =
966 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
967 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
968 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
969 IRB.getVoidTy(), PtrTy, IntptrTy);
970}
971
972/// Insert extern declaration of runtime-provided functions and globals.
973void MemorySanitizer::initializeCallbacks(Module &M,
974 const TargetLibraryInfo &TLI) {
975 // Only do this once.
976 if (CallbacksInitialized)
977 return;
978
979 IRBuilder<> IRB(*C);
980 // Initialize callbacks that are common for kernel and userspace
981 // instrumentation.
982 MsanChainOriginFn = M.getOrInsertFunction(
983 "__msan_chain_origin",
984 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
985 IRB.getInt32Ty());
986 MsanSetOriginFn = M.getOrInsertFunction(
987 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
988 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
989 MemmoveFn =
990 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
991 MemcpyFn =
992 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
993 MemsetFn = M.getOrInsertFunction("__msan_memset",
994 TLI.getAttrList(C, {1}, /*Signed=*/true),
995 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
996
997 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
998 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
999
1000 if (CompileKernel) {
1001 createKernelApi(M, TLI);
1002 } else {
1003 createUserspaceApi(M, TLI);
1004 }
1005 CallbacksInitialized = true;
1006}
1007
1008FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1009 int size) {
1010 FunctionCallee *Fns =
1011 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1012 switch (size) {
1013 case 1:
1014 return Fns[0];
1015 case 2:
1016 return Fns[1];
1017 case 4:
1018 return Fns[2];
1019 case 8:
1020 return Fns[3];
1021 default:
1022 return nullptr;
1023 }
1024}
1025
1026/// Module-level initialization.
1027///
1028/// inserts a call to __msan_init to the module's constructor list.
1029void MemorySanitizer::initializeModule(Module &M) {
1030 auto &DL = M.getDataLayout();
1031
1032 TargetTriple = M.getTargetTriple();
1033
1034 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1035 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1036 // Check the overrides first
1037 if (ShadowPassed || OriginPassed) {
1038 CustomMapParams.AndMask = ClAndMask;
1039 CustomMapParams.XorMask = ClXorMask;
1040 CustomMapParams.ShadowBase = ClShadowBase;
1041 CustomMapParams.OriginBase = ClOriginBase;
1042 MapParams = &CustomMapParams;
1043 } else {
1044 switch (TargetTriple.getOS()) {
1045 case Triple::FreeBSD:
1046 switch (TargetTriple.getArch()) {
1047 case Triple::aarch64:
1048 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1049 break;
1050 case Triple::x86_64:
1051 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1052 break;
1053 case Triple::x86:
1054 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1055 break;
1056 default:
1057 report_fatal_error("unsupported architecture");
1058 }
1059 break;
1060 case Triple::NetBSD:
1061 switch (TargetTriple.getArch()) {
1062 case Triple::x86_64:
1063 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1064 break;
1065 default:
1066 report_fatal_error("unsupported architecture");
1067 }
1068 break;
1069 case Triple::Linux:
1070 switch (TargetTriple.getArch()) {
1071 case Triple::x86_64:
1072 MapParams = Linux_X86_MemoryMapParams.bits64;
1073 break;
1074 case Triple::x86:
1075 MapParams = Linux_X86_MemoryMapParams.bits32;
1076 break;
1077 case Triple::mips64:
1078 case Triple::mips64el:
1079 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1080 break;
1081 case Triple::ppc64:
1082 case Triple::ppc64le:
1083 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1084 break;
1085 case Triple::systemz:
1086 MapParams = Linux_S390_MemoryMapParams.bits64;
1087 break;
1088 case Triple::aarch64:
1089 case Triple::aarch64_be:
1090 MapParams = Linux_ARM_MemoryMapParams.bits64;
1091 break;
1093 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1094 break;
1095 default:
1096 report_fatal_error("unsupported architecture");
1097 }
1098 break;
1099 default:
1100 report_fatal_error("unsupported operating system");
1101 }
1102 }
1103
1104 C = &(M.getContext());
1105 IRBuilder<> IRB(*C);
1106 IntptrTy = IRB.getIntPtrTy(DL);
1107 OriginTy = IRB.getInt32Ty();
1108 PtrTy = IRB.getPtrTy();
1109
1110 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1111 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1112
1113 if (!CompileKernel) {
1114 if (TrackOrigins)
1115 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1116 return new GlobalVariable(
1117 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1118 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1119 });
1120
1121 if (Recover)
1122 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1123 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1124 GlobalValue::WeakODRLinkage,
1125 IRB.getInt32(Recover), "__msan_keep_going");
1126 });
1127 }
1128}
1129
1130namespace {
1131
1132/// A helper class that handles instrumentation of VarArg
1133/// functions on a particular platform.
1134///
1135/// Implementations are expected to insert the instrumentation
1136/// necessary to propagate argument shadow through VarArg function
1137/// calls. Visit* methods are called during an InstVisitor pass over
1138/// the function, and should avoid creating new basic blocks. A new
1139/// instance of this class is created for each instrumented function.
1140struct VarArgHelper {
1141 virtual ~VarArgHelper() = default;
1142
1143 /// Visit a CallBase.
1144 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1145
1146 /// Visit a va_start call.
1147 virtual void visitVAStartInst(VAStartInst &I) = 0;
1148
1149 /// Visit a va_copy call.
1150 virtual void visitVACopyInst(VACopyInst &I) = 0;
1151
1152 /// Finalize function instrumentation.
1153 ///
1154 /// This method is called after visiting all interesting (see above)
1155 /// instructions in a function.
1156 virtual void finalizeInstrumentation() = 0;
1157};
1158
1159struct MemorySanitizerVisitor;
1160
1161} // end anonymous namespace
1162
1163static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1164 MemorySanitizerVisitor &Visitor);
1165
1166static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1167 if (TS.isScalable())
1168 // Scalable types unconditionally take slowpaths.
1169 return kNumberOfAccessSizes;
1170 unsigned TypeSizeFixed = TS.getFixedValue();
1171 if (TypeSizeFixed <= 8)
1172 return 0;
1173 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1174}
1175
1176namespace {
1177
1178/// Helper class to attach debug information of the given instruction onto new
1179/// instructions inserted after.
1180class NextNodeIRBuilder : public IRBuilder<> {
1181public:
1182 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1183 SetCurrentDebugLocation(IP->getDebugLoc());
1184 }
1185};
1186
1187/// This class does all the work for a given function. Store and Load
1188/// instructions store and load corresponding shadow and origin
1189/// values. Most instructions propagate shadow from arguments to their
1190/// return values. Certain instructions (most importantly, BranchInst)
1191/// test their argument shadow and print reports (with a runtime call) if it's
1192/// non-zero.
1193struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1194 Function &F;
1195 MemorySanitizer &MS;
1196 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1197 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1198 std::unique_ptr<VarArgHelper> VAHelper;
1199 const TargetLibraryInfo *TLI;
1200 Instruction *FnPrologueEnd;
1201 SmallVector<Instruction *, 16> Instructions;
1202
1203 // The following flags disable parts of MSan instrumentation based on
1204 // exclusion list contents and command-line options.
1205 bool InsertChecks;
1206 bool PropagateShadow;
1207 bool PoisonStack;
1208 bool PoisonUndef;
1209 bool PoisonUndefVectors;
1210
1211 struct ShadowOriginAndInsertPoint {
1212 Value *Shadow;
1213 Value *Origin;
1214 Instruction *OrigIns;
1215
1216 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1217 : Shadow(S), Origin(O), OrigIns(I) {}
1218 };
1220 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1221 SmallSetVector<AllocaInst *, 16> AllocaSet;
1224 int64_t SplittableBlocksCount = 0;
1225
1226 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1227 const TargetLibraryInfo &TLI)
1228 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1229 bool SanitizeFunction =
1230 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1231 InsertChecks = SanitizeFunction;
1232 PropagateShadow = SanitizeFunction;
1233 PoisonStack = SanitizeFunction && ClPoisonStack;
1234 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1235 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1236
1237 // In the presence of unreachable blocks, we may see Phi nodes with
1238 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1239 // blocks, such nodes will not have any shadow value associated with them.
1240 // It's easier to remove unreachable blocks than deal with missing shadow.
1242
1243 MS.initializeCallbacks(*F.getParent(), TLI);
1244 FnPrologueEnd =
1245 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1246 .CreateIntrinsic(Intrinsic::donothing, {});
1247
1248 if (MS.CompileKernel) {
1249 IRBuilder<> IRB(FnPrologueEnd);
1250 insertKmsanPrologue(IRB);
1251 }
1252
1253 LLVM_DEBUG(if (!InsertChecks) dbgs()
1254 << "MemorySanitizer is not inserting checks into '"
1255 << F.getName() << "'\n");
1256 }
1257
1258 bool instrumentWithCalls(Value *V) {
1259 // Constants likely will be eliminated by follow-up passes.
1260 if (isa<Constant>(V))
1261 return false;
1262 ++SplittableBlocksCount;
1264 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1265 }
1266
1267 bool isInPrologue(Instruction &I) {
1268 return I.getParent() == FnPrologueEnd->getParent() &&
1269 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1270 }
1271
1272 // Creates a new origin and records the stack trace. In general we can call
1273 // this function for any origin manipulation we like. However it will cost
1274 // runtime resources. So use this wisely only if it can provide additional
1275 // information helpful to a user.
1276 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1277 if (MS.TrackOrigins <= 1)
1278 return V;
1279 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1280 }
1281
1282 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1283 const DataLayout &DL = F.getDataLayout();
1284 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1285 if (IntptrSize == kOriginSize)
1286 return Origin;
1287 assert(IntptrSize == kOriginSize * 2);
1288 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1289 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1290 }
1291
1292 /// Fill memory range with the given origin value.
1293 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1294 TypeSize TS, Align Alignment) {
1295 const DataLayout &DL = F.getDataLayout();
1296 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1297 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1298 assert(IntptrAlignment >= kMinOriginAlignment);
1299 assert(IntptrSize >= kOriginSize);
1300
1301 // Note: The loop based formation works for fixed length vectors too,
1302 // however we prefer to unroll and specialize alignment below.
1303 if (TS.isScalable()) {
1304 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1305 Value *RoundUp =
1306 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1307 Value *End =
1308 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1309 auto [InsertPt, Index] =
1311 IRB.SetInsertPoint(InsertPt);
1312
1313 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1315 return;
1316 }
1317
1318 unsigned Size = TS.getFixedValue();
1319
1320 unsigned Ofs = 0;
1321 Align CurrentAlignment = Alignment;
1322 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1323 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1324 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1325 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1326 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1327 : IntptrOriginPtr;
1328 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1329 Ofs += IntptrSize / kOriginSize;
1330 CurrentAlignment = IntptrAlignment;
1331 }
1332 }
1333
1334 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1335 Value *GEP =
1336 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1337 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1338 CurrentAlignment = kMinOriginAlignment;
1339 }
1340 }
1341
1342 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1343 Value *OriginPtr, Align Alignment) {
1344 const DataLayout &DL = F.getDataLayout();
1345 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1346 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1347 // ZExt cannot convert between vector and scalar
1348 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1349 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1350 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1351 // Origin is not needed: value is initialized or const shadow is
1352 // ignored.
1353 return;
1354 }
1355 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1356 // Copy origin as the value is definitely uninitialized.
1357 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1358 OriginAlignment);
1359 return;
1360 }
1361 // Fallback to runtime check, which still can be optimized out later.
1362 }
1363
1364 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1365 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1366 if (instrumentWithCalls(ConvertedShadow) &&
1367 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1368 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1369 Value *ConvertedShadow2 =
1370 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1371 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1372 CB->addParamAttr(0, Attribute::ZExt);
1373 CB->addParamAttr(2, Attribute::ZExt);
1374 } else {
1375 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1377 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1378 IRBuilder<> IRBNew(CheckTerm);
1379 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1380 OriginAlignment);
1381 }
1382 }
1383
1384 void materializeStores() {
1385 for (StoreInst *SI : StoreList) {
1386 IRBuilder<> IRB(SI);
1387 Value *Val = SI->getValueOperand();
1388 Value *Addr = SI->getPointerOperand();
1389 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1390 Value *ShadowPtr, *OriginPtr;
1391 Type *ShadowTy = Shadow->getType();
1392 const Align Alignment = SI->getAlign();
1393 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1394 std::tie(ShadowPtr, OriginPtr) =
1395 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1396
1397 [[maybe_unused]] StoreInst *NewSI =
1398 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1399 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1400
1401 if (SI->isAtomic())
1402 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1403
1404 if (MS.TrackOrigins && !SI->isAtomic())
1405 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1406 OriginAlignment);
1407 }
1408 }
1409
1410 // Returns true if Debug Location corresponds to multiple warnings.
1411 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1412 if (MS.TrackOrigins < 2)
1413 return false;
1414
1415 if (LazyWarningDebugLocationCount.empty())
1416 for (const auto &I : InstrumentationList)
1417 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1418
1419 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1420 }
1421
1422 /// Helper function to insert a warning at IRB's current insert point.
1423 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1424 if (!Origin)
1425 Origin = (Value *)IRB.getInt32(0);
1426 assert(Origin->getType()->isIntegerTy());
1427
1428 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1429 // Try to create additional origin with debug info of the last origin
1430 // instruction. It may provide additional information to the user.
1431 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1432 assert(MS.TrackOrigins);
1433 auto NewDebugLoc = OI->getDebugLoc();
1434 // Origin update with missing or the same debug location provides no
1435 // additional value.
1436 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1437 // Insert update just before the check, so we call runtime only just
1438 // before the report.
1439 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1440 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1441 Origin = updateOrigin(Origin, IRBOrigin);
1442 }
1443 }
1444 }
1445
1446 if (MS.CompileKernel || MS.TrackOrigins)
1447 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1448 else
1449 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1450 // FIXME: Insert UnreachableInst if !MS.Recover?
1451 // This may invalidate some of the following checks and needs to be done
1452 // at the very end.
1453 }
1454
1455 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1456 Value *Origin) {
1457 const DataLayout &DL = F.getDataLayout();
1458 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1459 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1460 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1461 // ZExt cannot convert between vector and scalar
1462 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1463 Value *ConvertedShadow2 =
1464 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1465
1466 if (SizeIndex < kNumberOfAccessSizes) {
1467 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1468 CallBase *CB = IRB.CreateCall(
1469 Fn,
1470 {ConvertedShadow2,
1471 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1472 CB->addParamAttr(0, Attribute::ZExt);
1473 CB->addParamAttr(1, Attribute::ZExt);
1474 } else {
1475 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1476 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1477 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1478 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1479 CallBase *CB = IRB.CreateCall(
1480 Fn,
1481 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1482 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1483 CB->addParamAttr(1, Attribute::ZExt);
1484 CB->addParamAttr(2, Attribute::ZExt);
1485 }
1486 } else {
1487 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1489 Cmp, &*IRB.GetInsertPoint(),
1490 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1491
1492 IRB.SetInsertPoint(CheckTerm);
1493 insertWarningFn(IRB, Origin);
1494 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1495 }
1496 }
1497
1498 void materializeInstructionChecks(
1499 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1500 const DataLayout &DL = F.getDataLayout();
1501 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1502 // correct origin.
1503 bool Combine = !MS.TrackOrigins;
1504 Instruction *Instruction = InstructionChecks.front().OrigIns;
1505 Value *Shadow = nullptr;
1506 for (const auto &ShadowData : InstructionChecks) {
1507 assert(ShadowData.OrigIns == Instruction);
1508 IRBuilder<> IRB(Instruction);
1509
1510 Value *ConvertedShadow = ShadowData.Shadow;
1511
1512 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1513 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1514 // Skip, value is initialized or const shadow is ignored.
1515 continue;
1516 }
1517 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1518 // Report as the value is definitely uninitialized.
1519 insertWarningFn(IRB, ShadowData.Origin);
1520 if (!MS.Recover)
1521 return; // Always fail and stop here, not need to check the rest.
1522 // Skip entire instruction,
1523 continue;
1524 }
1525 // Fallback to runtime check, which still can be optimized out later.
1526 }
1527
1528 if (!Combine) {
1529 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1530 continue;
1531 }
1532
1533 if (!Shadow) {
1534 Shadow = ConvertedShadow;
1535 continue;
1536 }
1537
1538 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1539 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1540 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1541 }
1542
1543 if (Shadow) {
1544 assert(Combine);
1545 IRBuilder<> IRB(Instruction);
1546 materializeOneCheck(IRB, Shadow, nullptr);
1547 }
1548 }
1549
1550 static bool isAArch64SVCount(Type *Ty) {
1551 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1552 return TTy->getName() == "aarch64.svcount";
1553 return false;
1554 }
1555
1556 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1557 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1558 static bool isScalableNonVectorType(Type *Ty) {
1559 if (!isAArch64SVCount(Ty))
1560 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1561 << "\n");
1562
1563 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1564 }
1565
1566 void materializeChecks() {
1567#ifndef NDEBUG
1568 // For assert below.
1569 SmallPtrSet<Instruction *, 16> Done;
1570#endif
1571
1572 for (auto I = InstrumentationList.begin();
1573 I != InstrumentationList.end();) {
1574 auto OrigIns = I->OrigIns;
1575 // Checks are grouped by the original instruction. We call all
1576 // `insertShadowCheck` for an instruction at once.
1577 assert(Done.insert(OrigIns).second);
1578 auto J = std::find_if(I + 1, InstrumentationList.end(),
1579 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1580 return OrigIns != R.OrigIns;
1581 });
1582 // Process all checks of instruction at once.
1583 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1584 I = J;
1585 }
1586
1587 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1588 }
1589
1590 // Returns the last instruction in the new prologue
1591 void insertKmsanPrologue(IRBuilder<> &IRB) {
1592 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1593 Constant *Zero = IRB.getInt32(0);
1594 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1595 {Zero, IRB.getInt32(0)}, "param_shadow");
1596 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1597 {Zero, IRB.getInt32(1)}, "retval_shadow");
1598 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1599 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1600 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1601 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1602 MS.VAArgOverflowSizeTLS =
1603 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1604 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1605 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1606 {Zero, IRB.getInt32(5)}, "param_origin");
1607 MS.RetvalOriginTLS =
1608 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1609 {Zero, IRB.getInt32(6)}, "retval_origin");
1610 if (MS.TargetTriple.getArch() == Triple::systemz)
1611 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1612 }
1613
1614 /// Add MemorySanitizer instrumentation to a function.
1615 bool runOnFunction() {
1616 // Iterate all BBs in depth-first order and create shadow instructions
1617 // for all instructions (where applicable).
1618 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1619 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1620 visit(*BB);
1621
1622 // `visit` above only collects instructions. Process them after iterating
1623 // CFG to avoid requirement on CFG transformations.
1624 for (Instruction *I : Instructions)
1626
1627 // Finalize PHI nodes.
1628 for (PHINode *PN : ShadowPHINodes) {
1629 PHINode *PNS = cast<PHINode>(getShadow(PN));
1630 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1631 size_t NumValues = PN->getNumIncomingValues();
1632 for (size_t v = 0; v < NumValues; v++) {
1633 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1634 if (PNO)
1635 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1636 }
1637 }
1638
1639 VAHelper->finalizeInstrumentation();
1640
1641 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1642 // instrumenting only allocas.
1644 for (auto Item : LifetimeStartList) {
1645 instrumentAlloca(*Item.second, Item.first);
1646 AllocaSet.remove(Item.second);
1647 }
1648 }
1649 // Poison the allocas for which we didn't instrument the corresponding
1650 // lifetime intrinsics.
1651 for (AllocaInst *AI : AllocaSet)
1652 instrumentAlloca(*AI);
1653
1654 // Insert shadow value checks.
1655 materializeChecks();
1656
1657 // Delayed instrumentation of StoreInst.
1658 // This may not add new address checks.
1659 materializeStores();
1660
1661 return true;
1662 }
1663
1664 /// Compute the shadow type that corresponds to a given Value.
1665 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1666
1667 /// Compute the shadow type that corresponds to a given Type.
1668 Type *getShadowTy(Type *OrigTy) {
1669 if (!OrigTy->isSized()) {
1670 return nullptr;
1671 }
1672 // For integer type, shadow is the same as the original type.
1673 // This may return weird-sized types like i1.
1674 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1675 return IT;
1676 const DataLayout &DL = F.getDataLayout();
1677 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1678 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1679 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1680 VT->getElementCount());
1681 }
1682 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1683 return ArrayType::get(getShadowTy(AT->getElementType()),
1684 AT->getNumElements());
1685 }
1686 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1688 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1689 Elements.push_back(getShadowTy(ST->getElementType(i)));
1690 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1691 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1692 return Res;
1693 }
1694 if (isScalableNonVectorType(OrigTy)) {
1695 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1696 << "\n");
1697 return OrigTy;
1698 }
1699
1700 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1701 return IntegerType::get(*MS.C, TypeSize);
1702 }
1703
1704 /// Extract combined shadow of struct elements as a bool
1705 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1706 IRBuilder<> &IRB) {
1707 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1708 Value *Aggregator = FalseVal;
1709
1710 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1711 // Combine by ORing together each element's bool shadow
1712 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1713 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1714
1715 if (Aggregator != FalseVal)
1716 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1717 else
1718 Aggregator = ShadowBool;
1719 }
1720
1721 return Aggregator;
1722 }
1723
1724 // Extract combined shadow of array elements
1725 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1726 IRBuilder<> &IRB) {
1727 if (!Array->getNumElements())
1728 return IRB.getIntN(/* width */ 1, /* value */ 0);
1729
1730 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1731 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1732
1733 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1734 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1735 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1736 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1737 }
1738 return Aggregator;
1739 }
1740
1741 /// Convert a shadow value to it's flattened variant. The resulting
1742 /// shadow may not necessarily have the same bit width as the input
1743 /// value, but it will always be comparable to zero.
1744 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1745 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1746 return collapseStructShadow(Struct, V, IRB);
1747 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1748 return collapseArrayShadow(Array, V, IRB);
1749 if (isa<VectorType>(V->getType())) {
1750 if (isa<ScalableVectorType>(V->getType()))
1751 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1752 unsigned BitWidth =
1753 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1754 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1755 }
1756 return V;
1757 }
1758
1759 // Convert a scalar value to an i1 by comparing with 0
1760 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1761 Type *VTy = V->getType();
1762 if (!VTy->isIntegerTy())
1763 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1764 if (VTy->getIntegerBitWidth() == 1)
1765 // Just converting a bool to a bool, so do nothing.
1766 return V;
1767 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1768 }
1769
1770 Type *ptrToIntPtrType(Type *PtrTy) const {
1771 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1772 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1773 VectTy->getElementCount());
1774 }
1775 assert(PtrTy->isIntOrPtrTy());
1776 return MS.IntptrTy;
1777 }
1778
1779 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1780 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1781 return VectorType::get(
1782 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1783 VectTy->getElementCount());
1784 }
1785 assert(IntPtrTy == MS.IntptrTy);
1786 return MS.PtrTy;
1787 }
1788
1789 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1790 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1792 VectTy->getElementCount(),
1793 constToIntPtr(VectTy->getElementType(), C));
1794 }
1795 assert(IntPtrTy == MS.IntptrTy);
1796 // TODO: Avoid implicit trunc?
1797 // See https://github.com/llvm/llvm-project/issues/112510.
1798 return ConstantInt::get(MS.IntptrTy, C, /*IsSigned=*/false,
1799 /*ImplicitTrunc=*/true);
1800 }
1801
1802 /// Returns the integer shadow offset that corresponds to a given
1803 /// application address, whereby:
1804 ///
1805 /// Offset = (Addr & ~AndMask) ^ XorMask
1806 /// Shadow = ShadowBase + Offset
1807 /// Origin = (OriginBase + Offset) & ~Alignment
1808 ///
1809 /// Note: for efficiency, many shadow mappings only require use the XorMask
1810 /// and OriginBase; the AndMask and ShadowBase are often zero.
1811 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1812 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1813 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1814
1815 if (uint64_t AndMask = MS.MapParams->AndMask)
1816 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1817
1818 if (uint64_t XorMask = MS.MapParams->XorMask)
1819 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1820 return OffsetLong;
1821 }
1822
1823 /// Compute the shadow and origin addresses corresponding to a given
1824 /// application address.
1825 ///
1826 /// Shadow = ShadowBase + Offset
1827 /// Origin = (OriginBase + Offset) & ~3ULL
1828 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1829 /// a single pointee.
1830 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1831 std::pair<Value *, Value *>
1832 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1833 MaybeAlign Alignment) {
1834 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1835 if (!VectTy) {
1836 assert(Addr->getType()->isPointerTy());
1837 } else {
1838 assert(VectTy->getElementType()->isPointerTy());
1839 }
1840 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1841 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1842 Value *ShadowLong = ShadowOffset;
1843 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1844 ShadowLong =
1845 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1846 }
1847 Value *ShadowPtr = IRB.CreateIntToPtr(
1848 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1849
1850 Value *OriginPtr = nullptr;
1851 if (MS.TrackOrigins) {
1852 Value *OriginLong = ShadowOffset;
1853 uint64_t OriginBase = MS.MapParams->OriginBase;
1854 if (OriginBase != 0)
1855 OriginLong =
1856 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1857 if (!Alignment || *Alignment < kMinOriginAlignment) {
1858 uint64_t Mask = kMinOriginAlignment.value() - 1;
1859 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1860 }
1861 OriginPtr = IRB.CreateIntToPtr(
1862 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1863 }
1864 return std::make_pair(ShadowPtr, OriginPtr);
1865 }
1866
1867 template <typename... ArgsTy>
1868 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1869 ArgsTy... Args) {
1870 if (MS.TargetTriple.getArch() == Triple::systemz) {
1871 IRB.CreateCall(Callee,
1872 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1873 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1874 }
1875
1876 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1877 }
1878
1879 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1880 IRBuilder<> &IRB,
1881 Type *ShadowTy,
1882 bool isStore) {
1883 Value *ShadowOriginPtrs;
1884 const DataLayout &DL = F.getDataLayout();
1885 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1886
1887 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1888 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1889 if (Getter) {
1890 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1891 } else {
1892 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1893 ShadowOriginPtrs = createMetadataCall(
1894 IRB,
1895 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1896 AddrCast, SizeVal);
1897 }
1898 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1899 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1900 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1901
1902 return std::make_pair(ShadowPtr, OriginPtr);
1903 }
1904
1905 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1906 /// a single pointee.
1907 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1908 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1909 IRBuilder<> &IRB,
1910 Type *ShadowTy,
1911 bool isStore) {
1912 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1913 if (!VectTy) {
1914 assert(Addr->getType()->isPointerTy());
1915 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1916 }
1917
1918 // TODO: Support callbacs with vectors of addresses.
1919 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1920 Value *ShadowPtrs = ConstantInt::getNullValue(
1921 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1922 Value *OriginPtrs = nullptr;
1923 if (MS.TrackOrigins)
1924 OriginPtrs = ConstantInt::getNullValue(
1925 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1926 for (unsigned i = 0; i < NumElements; ++i) {
1927 Value *OneAddr =
1928 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1929 auto [ShadowPtr, OriginPtr] =
1930 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1931
1932 ShadowPtrs = IRB.CreateInsertElement(
1933 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1934 if (MS.TrackOrigins)
1935 OriginPtrs = IRB.CreateInsertElement(
1936 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1937 }
1938 return {ShadowPtrs, OriginPtrs};
1939 }
1940
1941 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1942 Type *ShadowTy,
1943 MaybeAlign Alignment,
1944 bool isStore) {
1945 if (MS.CompileKernel)
1946 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1947 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1948 }
1949
1950 /// Compute the shadow address for a given function argument.
1951 ///
1952 /// Shadow = ParamTLS+ArgOffset.
1953 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1954 return IRB.CreatePtrAdd(MS.ParamTLS,
1955 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1956 }
1957
1958 /// Compute the origin address for a given function argument.
1959 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1960 if (!MS.TrackOrigins)
1961 return nullptr;
1962 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1963 ConstantInt::get(MS.IntptrTy, ArgOffset),
1964 "_msarg_o");
1965 }
1966
1967 /// Compute the shadow address for a retval.
1968 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1969 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1970 }
1971
1972 /// Compute the origin address for a retval.
1973 Value *getOriginPtrForRetval() {
1974 // We keep a single origin for the entire retval. Might be too optimistic.
1975 return MS.RetvalOriginTLS;
1976 }
1977
1978 /// Set SV to be the shadow value for V.
1979 void setShadow(Value *V, Value *SV) {
1980 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1981 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1982 }
1983
1984 /// Set Origin to be the origin value for V.
1985 void setOrigin(Value *V, Value *Origin) {
1986 if (!MS.TrackOrigins)
1987 return;
1988 assert(!OriginMap.count(V) && "Values may only have one origin");
1989 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1990 OriginMap[V] = Origin;
1991 }
1992
1993 Constant *getCleanShadow(Type *OrigTy) {
1994 Type *ShadowTy = getShadowTy(OrigTy);
1995 if (!ShadowTy)
1996 return nullptr;
1997 return Constant::getNullValue(ShadowTy);
1998 }
1999
2000 /// Create a clean shadow value for a given value.
2001 ///
2002 /// Clean shadow (all zeroes) means all bits of the value are defined
2003 /// (initialized).
2004 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2005
2006 /// Create a dirty shadow of a given shadow type.
2007 Constant *getPoisonedShadow(Type *ShadowTy) {
2008 assert(ShadowTy);
2009 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2010 return Constant::getAllOnesValue(ShadowTy);
2011 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2012 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2013 getPoisonedShadow(AT->getElementType()));
2014 return ConstantArray::get(AT, Vals);
2015 }
2016 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2017 SmallVector<Constant *, 4> Vals;
2018 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2019 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2020 return ConstantStruct::get(ST, Vals);
2021 }
2022 llvm_unreachable("Unexpected shadow type");
2023 }
2024
2025 /// Create a dirty shadow for a given value.
2026 Constant *getPoisonedShadow(Value *V) {
2027 Type *ShadowTy = getShadowTy(V);
2028 if (!ShadowTy)
2029 return nullptr;
2030 return getPoisonedShadow(ShadowTy);
2031 }
2032
2033 /// Create a clean (zero) origin.
2034 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2035
2036 /// Get the shadow value for a given Value.
2037 ///
2038 /// This function either returns the value set earlier with setShadow,
2039 /// or extracts if from ParamTLS (for function arguments).
2040 Value *getShadow(Value *V) {
2041 if (Instruction *I = dyn_cast<Instruction>(V)) {
2042 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2043 return getCleanShadow(V);
2044 // For instructions the shadow is already stored in the map.
2045 Value *Shadow = ShadowMap[V];
2046 if (!Shadow) {
2047 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2048 assert(Shadow && "No shadow for a value");
2049 }
2050 return Shadow;
2051 }
2052 // Handle fully undefined values
2053 // (partially undefined constant vectors are handled later)
2054 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2055 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2056 : getCleanShadow(V);
2057 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2058 return AllOnes;
2059 }
2060 if (Argument *A = dyn_cast<Argument>(V)) {
2061 // For arguments we compute the shadow on demand and store it in the map.
2062 Value *&ShadowPtr = ShadowMap[V];
2063 if (ShadowPtr)
2064 return ShadowPtr;
2065 Function *F = A->getParent();
2066 IRBuilder<> EntryIRB(FnPrologueEnd);
2067 unsigned ArgOffset = 0;
2068 const DataLayout &DL = F->getDataLayout();
2069 for (auto &FArg : F->args()) {
2070 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2071 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2072 ? "vscale not fully supported\n"
2073 : "Arg is not sized\n"));
2074 if (A == &FArg) {
2075 ShadowPtr = getCleanShadow(V);
2076 setOrigin(A, getCleanOrigin());
2077 break;
2078 }
2079 continue;
2080 }
2081
2082 unsigned Size = FArg.hasByValAttr()
2083 ? DL.getTypeAllocSize(FArg.getParamByValType())
2084 : DL.getTypeAllocSize(FArg.getType());
2085
2086 if (A == &FArg) {
2087 bool Overflow = ArgOffset + Size > kParamTLSSize;
2088 if (FArg.hasByValAttr()) {
2089 // ByVal pointer itself has clean shadow. We copy the actual
2090 // argument shadow to the underlying memory.
2091 // Figure out maximal valid memcpy alignment.
2092 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2093 FArg.getParamAlign(), FArg.getParamByValType());
2094 Value *CpShadowPtr, *CpOriginPtr;
2095 std::tie(CpShadowPtr, CpOriginPtr) =
2096 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2097 /*isStore*/ true);
2098 if (!PropagateShadow || Overflow) {
2099 // ParamTLS overflow.
2100 EntryIRB.CreateMemSet(
2101 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2102 Size, ArgAlign);
2103 } else {
2104 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2105 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2106 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2107 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2108 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2109
2110 if (MS.TrackOrigins) {
2111 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2112 // FIXME: OriginSize should be:
2113 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2114 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2115 EntryIRB.CreateMemCpy(
2116 CpOriginPtr,
2117 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2118 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2119 OriginSize);
2120 }
2121 }
2122 }
2123
2124 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2125 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2126 ShadowPtr = getCleanShadow(V);
2127 setOrigin(A, getCleanOrigin());
2128 } else {
2129 // Shadow over TLS
2130 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2131 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2133 if (MS.TrackOrigins) {
2134 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2135 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2136 }
2137 }
2139 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2140 break;
2141 }
2142
2143 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2144 }
2145 assert(ShadowPtr && "Could not find shadow for an argument");
2146 return ShadowPtr;
2147 }
2148
2149 // Check for partially-undefined constant vectors
2150 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2151 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2152 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2153 PoisonUndefVectors) {
2154 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2155 SmallVector<Constant *, 32> ShadowVector(NumElems);
2156 for (unsigned i = 0; i != NumElems; ++i) {
2157 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2158 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2159 : getCleanShadow(Elem);
2160 }
2161
2162 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2163 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2164 << *ShadowConstant << "\n");
2165
2166 return ShadowConstant;
2167 }
2168
2169 // TODO: partially-undefined constant arrays, structures, and nested types
2170
2171 // For everything else the shadow is zero.
2172 return getCleanShadow(V);
2173 }
2174
2175 /// Get the shadow for i-th argument of the instruction I.
2176 Value *getShadow(Instruction *I, int i) {
2177 return getShadow(I->getOperand(i));
2178 }
2179
2180 /// Get the origin for a value.
2181 Value *getOrigin(Value *V) {
2182 if (!MS.TrackOrigins)
2183 return nullptr;
2184 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2185 return getCleanOrigin();
2187 "Unexpected value type in getOrigin()");
2188 if (Instruction *I = dyn_cast<Instruction>(V)) {
2189 if (I->getMetadata(LLVMContext::MD_nosanitize))
2190 return getCleanOrigin();
2191 }
2192 Value *Origin = OriginMap[V];
2193 assert(Origin && "Missing origin");
2194 return Origin;
2195 }
2196
2197 /// Get the origin for i-th argument of the instruction I.
2198 Value *getOrigin(Instruction *I, int i) {
2199 return getOrigin(I->getOperand(i));
2200 }
2201
2202 /// Remember the place where a shadow check should be inserted.
2203 ///
2204 /// This location will be later instrumented with a check that will print a
2205 /// UMR warning in runtime if the shadow value is not 0.
2206 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2207 assert(Shadow);
2208 if (!InsertChecks)
2209 return;
2210
2211 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2212 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2213 << *OrigIns << "\n");
2214 return;
2215 }
2216
2217 Type *ShadowTy = Shadow->getType();
2218 if (isScalableNonVectorType(ShadowTy)) {
2219 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2220 << " before " << *OrigIns << "\n");
2221 return;
2222 }
2223#ifndef NDEBUG
2224 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2225 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2226 "Can only insert checks for integer, vector, and aggregate shadow "
2227 "types");
2228#endif
2229 InstrumentationList.push_back(
2230 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2231 }
2232
2233 /// Get shadow for value, and remember the place where a shadow check should
2234 /// be inserted.
2235 ///
2236 /// This location will be later instrumented with a check that will print a
2237 /// UMR warning in runtime if the value is not fully defined.
2238 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2239 assert(Val);
2240 Value *Shadow, *Origin;
2242 Shadow = getShadow(Val);
2243 if (!Shadow)
2244 return;
2245 Origin = getOrigin(Val);
2246 } else {
2247 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2248 if (!Shadow)
2249 return;
2250 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2251 }
2252 insertCheckShadow(Shadow, Origin, OrigIns);
2253 }
2254
2256 switch (a) {
2257 case AtomicOrdering::NotAtomic:
2258 return AtomicOrdering::NotAtomic;
2259 case AtomicOrdering::Unordered:
2260 case AtomicOrdering::Monotonic:
2261 case AtomicOrdering::Release:
2262 return AtomicOrdering::Release;
2263 case AtomicOrdering::Acquire:
2264 case AtomicOrdering::AcquireRelease:
2265 return AtomicOrdering::AcquireRelease;
2266 case AtomicOrdering::SequentiallyConsistent:
2267 return AtomicOrdering::SequentiallyConsistent;
2268 }
2269 llvm_unreachable("Unknown ordering");
2270 }
2271
2272 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2273 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2274 uint32_t OrderingTable[NumOrderings] = {};
2275
2276 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2277 OrderingTable[(int)AtomicOrderingCABI::release] =
2278 (int)AtomicOrderingCABI::release;
2279 OrderingTable[(int)AtomicOrderingCABI::consume] =
2280 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2281 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2282 (int)AtomicOrderingCABI::acq_rel;
2283 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2284 (int)AtomicOrderingCABI::seq_cst;
2285
2286 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2287 }
2288
2290 switch (a) {
2291 case AtomicOrdering::NotAtomic:
2292 return AtomicOrdering::NotAtomic;
2293 case AtomicOrdering::Unordered:
2294 case AtomicOrdering::Monotonic:
2295 case AtomicOrdering::Acquire:
2296 return AtomicOrdering::Acquire;
2297 case AtomicOrdering::Release:
2298 case AtomicOrdering::AcquireRelease:
2299 return AtomicOrdering::AcquireRelease;
2300 case AtomicOrdering::SequentiallyConsistent:
2301 return AtomicOrdering::SequentiallyConsistent;
2302 }
2303 llvm_unreachable("Unknown ordering");
2304 }
2305
2306 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2307 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2308 uint32_t OrderingTable[NumOrderings] = {};
2309
2310 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2311 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2312 OrderingTable[(int)AtomicOrderingCABI::consume] =
2313 (int)AtomicOrderingCABI::acquire;
2314 OrderingTable[(int)AtomicOrderingCABI::release] =
2315 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2316 (int)AtomicOrderingCABI::acq_rel;
2317 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2318 (int)AtomicOrderingCABI::seq_cst;
2319
2320 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2321 }
2322
2323 // ------------------- Visitors.
2324 using InstVisitor<MemorySanitizerVisitor>::visit;
2325 void visit(Instruction &I) {
2326 if (I.getMetadata(LLVMContext::MD_nosanitize))
2327 return;
2328 // Don't want to visit if we're in the prologue
2329 if (isInPrologue(I))
2330 return;
2331 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2332 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2333 // We still need to set the shadow and origin to clean values.
2334 setShadow(&I, getCleanShadow(&I));
2335 setOrigin(&I, getCleanOrigin());
2336 return;
2337 }
2338
2339 Instructions.push_back(&I);
2340 }
2341
2342 /// Instrument LoadInst
2343 ///
2344 /// Loads the corresponding shadow and (optionally) origin.
2345 /// Optionally, checks that the load address is fully defined.
2346 void visitLoadInst(LoadInst &I) {
2347 assert(I.getType()->isSized() && "Load type must have size");
2348 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2349 NextNodeIRBuilder IRB(&I);
2350 Type *ShadowTy = getShadowTy(&I);
2351 Value *Addr = I.getPointerOperand();
2352 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2353 const Align Alignment = I.getAlign();
2354 if (PropagateShadow) {
2355 std::tie(ShadowPtr, OriginPtr) =
2356 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2357 setShadow(&I,
2358 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2359 } else {
2360 setShadow(&I, getCleanShadow(&I));
2361 }
2362
2364 insertCheckShadowOf(I.getPointerOperand(), &I);
2365
2366 if (I.isAtomic())
2367 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2368
2369 if (MS.TrackOrigins) {
2370 if (PropagateShadow) {
2371 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2372 setOrigin(
2373 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2374 } else {
2375 setOrigin(&I, getCleanOrigin());
2376 }
2377 }
2378 }
2379
2380 /// Instrument StoreInst
2381 ///
2382 /// Stores the corresponding shadow and (optionally) origin.
2383 /// Optionally, checks that the store address is fully defined.
2384 void visitStoreInst(StoreInst &I) {
2385 StoreList.push_back(&I);
2387 insertCheckShadowOf(I.getPointerOperand(), &I);
2388 }
2389
2390 void handleCASOrRMW(Instruction &I) {
2392
2393 IRBuilder<> IRB(&I);
2394 Value *Addr = I.getOperand(0);
2395 Value *Val = I.getOperand(1);
2396 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2397 /*isStore*/ true)
2398 .first;
2399
2401 insertCheckShadowOf(Addr, &I);
2402
2403 // Only test the conditional argument of cmpxchg instruction.
2404 // The other argument can potentially be uninitialized, but we can not
2405 // detect this situation reliably without possible false positives.
2407 insertCheckShadowOf(Val, &I);
2408
2409 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2410
2411 setShadow(&I, getCleanShadow(&I));
2412 setOrigin(&I, getCleanOrigin());
2413 }
2414
2415 void visitAtomicRMWInst(AtomicRMWInst &I) {
2416 handleCASOrRMW(I);
2417 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2418 }
2419
2420 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2421 handleCASOrRMW(I);
2422 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2423 }
2424
2425 // Vector manipulation.
2426 void visitExtractElementInst(ExtractElementInst &I) {
2427 insertCheckShadowOf(I.getOperand(1), &I);
2428 IRBuilder<> IRB(&I);
2429 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2430 "_msprop"));
2431 setOrigin(&I, getOrigin(&I, 0));
2432 }
2433
2434 void visitInsertElementInst(InsertElementInst &I) {
2435 insertCheckShadowOf(I.getOperand(2), &I);
2436 IRBuilder<> IRB(&I);
2437 auto *Shadow0 = getShadow(&I, 0);
2438 auto *Shadow1 = getShadow(&I, 1);
2439 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2440 "_msprop"));
2441 setOriginForNaryOp(I);
2442 }
2443
2444 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2445 IRBuilder<> IRB(&I);
2446 auto *Shadow0 = getShadow(&I, 0);
2447 auto *Shadow1 = getShadow(&I, 1);
2448 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2449 "_msprop"));
2450 setOriginForNaryOp(I);
2451 }
2452
2453 // Casts.
2454 void visitSExtInst(SExtInst &I) {
2455 IRBuilder<> IRB(&I);
2456 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2457 setOrigin(&I, getOrigin(&I, 0));
2458 }
2459
2460 void visitZExtInst(ZExtInst &I) {
2461 IRBuilder<> IRB(&I);
2462 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2463 setOrigin(&I, getOrigin(&I, 0));
2464 }
2465
2466 void visitTruncInst(TruncInst &I) {
2467 IRBuilder<> IRB(&I);
2468 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2469 setOrigin(&I, getOrigin(&I, 0));
2470 }
2471
2472 void visitBitCastInst(BitCastInst &I) {
2473 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2474 // a musttail call and a ret, don't instrument. New instructions are not
2475 // allowed after a musttail call.
2476 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2477 if (CI->isMustTailCall())
2478 return;
2479 IRBuilder<> IRB(&I);
2480 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2481 setOrigin(&I, getOrigin(&I, 0));
2482 }
2483
2484 void visitPtrToIntInst(PtrToIntInst &I) {
2485 IRBuilder<> IRB(&I);
2486 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2487 "_msprop_ptrtoint"));
2488 setOrigin(&I, getOrigin(&I, 0));
2489 }
2490
2491 void visitIntToPtrInst(IntToPtrInst &I) {
2492 IRBuilder<> IRB(&I);
2493 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2494 "_msprop_inttoptr"));
2495 setOrigin(&I, getOrigin(&I, 0));
2496 }
2497
2498 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2499 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2500 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2501 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2502 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2503 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2504
2505 /// Generic handler to compute shadow for bitwise AND.
2506 ///
2507 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2508 ///
2509 /// This code is precise: it implements the rule that "And" of an initialized
2510 /// zero bit always results in an initialized value:
2511 // 1&1 => 1; 0&1 => 0; p&1 => p;
2512 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2513 // 1&p => p; 0&p => 0; p&p => p;
2514 //
2515 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2516 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2517 Value *S2) {
2518 Value *S1S2 = IRB.CreateAnd(S1, S2);
2519 Value *V1S2 = IRB.CreateAnd(V1, S2);
2520 Value *S1V2 = IRB.CreateAnd(S1, V2);
2521
2522 if (V1->getType() != S1->getType()) {
2523 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2524 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2525 }
2526
2527 return IRB.CreateOr({S1S2, V1S2, S1V2});
2528 }
2529
2530 /// Handler for bitwise AND operator.
2531 void visitAnd(BinaryOperator &I) {
2532 IRBuilder<> IRB(&I);
2533 Value *V1 = I.getOperand(0);
2534 Value *V2 = I.getOperand(1);
2535 Value *S1 = getShadow(&I, 0);
2536 Value *S2 = getShadow(&I, 1);
2537
2538 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2539
2540 setShadow(&I, OutShadow);
2541 setOriginForNaryOp(I);
2542 }
2543
2544 void visitOr(BinaryOperator &I) {
2545 IRBuilder<> IRB(&I);
2546 // "Or" of 1 and a poisoned value results in unpoisoned value:
2547 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2548 // 1|0 => 1; 0|0 => 0; p|0 => p;
2549 // 1|p => 1; 0|p => p; p|p => p;
2550 //
2551 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2552 //
2553 // If the "disjoint OR" property is violated, the result is poison, and
2554 // hence the entire shadow is uninitialized:
2555 // S = S | SignExt(V1 & V2 != 0)
2556 Value *S1 = getShadow(&I, 0);
2557 Value *S2 = getShadow(&I, 1);
2558 Value *V1 = I.getOperand(0);
2559 Value *V2 = I.getOperand(1);
2560 if (V1->getType() != S1->getType()) {
2561 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2562 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2563 }
2564
2565 Value *NotV1 = IRB.CreateNot(V1);
2566 Value *NotV2 = IRB.CreateNot(V2);
2567
2568 Value *S1S2 = IRB.CreateAnd(S1, S2);
2569 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2570 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2571
2572 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2573
2574 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2575 Value *V1V2 = IRB.CreateAnd(V1, V2);
2576 Value *DisjointOrShadow = IRB.CreateSExt(
2577 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2578 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2579 }
2580
2581 setShadow(&I, S);
2582 setOriginForNaryOp(I);
2583 }
2584
2585 /// Default propagation of shadow and/or origin.
2586 ///
2587 /// This class implements the general case of shadow propagation, used in all
2588 /// cases where we don't know and/or don't care about what the operation
2589 /// actually does. It converts all input shadow values to a common type
2590 /// (extending or truncating as necessary), and bitwise OR's them.
2591 ///
2592 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2593 /// fully initialized), and less prone to false positives.
2594 ///
2595 /// This class also implements the general case of origin propagation. For a
2596 /// Nary operation, result origin is set to the origin of an argument that is
2597 /// not entirely initialized. If there is more than one such arguments, the
2598 /// rightmost of them is picked. It does not matter which one is picked if all
2599 /// arguments are initialized.
2600 template <bool CombineShadow> class Combiner {
2601 Value *Shadow = nullptr;
2602 Value *Origin = nullptr;
2603 IRBuilder<> &IRB;
2604 MemorySanitizerVisitor *MSV;
2605
2606 public:
2607 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2608 : IRB(IRB), MSV(MSV) {}
2609
2610 /// Add a pair of shadow and origin values to the mix.
2611 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2612 if (CombineShadow) {
2613 assert(OpShadow);
2614 if (!Shadow)
2615 Shadow = OpShadow;
2616 else {
2617 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2618 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2619 }
2620 }
2621
2622 if (MSV->MS.TrackOrigins) {
2623 assert(OpOrigin);
2624 if (!Origin) {
2625 Origin = OpOrigin;
2626 } else {
2627 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2628 // No point in adding something that might result in 0 origin value.
2629 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2630 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2631 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2632 }
2633 }
2634 }
2635 return *this;
2636 }
2637
2638 /// Add an application value to the mix.
2639 Combiner &Add(Value *V) {
2640 Value *OpShadow = MSV->getShadow(V);
2641 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2642 return Add(OpShadow, OpOrigin);
2643 }
2644
2645 /// Set the current combined values as the given instruction's shadow
2646 /// and origin.
2647 void Done(Instruction *I) {
2648 if (CombineShadow) {
2649 assert(Shadow);
2650 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2651 MSV->setShadow(I, Shadow);
2652 }
2653 if (MSV->MS.TrackOrigins) {
2654 assert(Origin);
2655 MSV->setOrigin(I, Origin);
2656 }
2657 }
2658
2659 /// Store the current combined value at the specified origin
2660 /// location.
2661 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2662 if (MSV->MS.TrackOrigins) {
2663 assert(Origin);
2664 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2665 }
2666 }
2667 };
2668
2669 using ShadowAndOriginCombiner = Combiner<true>;
2670 using OriginCombiner = Combiner<false>;
2671
2672 /// Propagate origin for arbitrary operation.
2673 void setOriginForNaryOp(Instruction &I) {
2674 if (!MS.TrackOrigins)
2675 return;
2676 IRBuilder<> IRB(&I);
2677 OriginCombiner OC(this, IRB);
2678 for (Use &Op : I.operands())
2679 OC.Add(Op.get());
2680 OC.Done(&I);
2681 }
2682
2683 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2684 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2685 "Vector of pointers is not a valid shadow type");
2686 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2688 : Ty->getPrimitiveSizeInBits();
2689 }
2690
2691 /// Cast between two shadow types, extending or truncating as
2692 /// necessary.
2693 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2694 bool Signed = false) {
2695 Type *srcTy = V->getType();
2696 if (srcTy == dstTy)
2697 return V;
2698 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2699 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2700 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2701 return IRB.CreateICmpNE(V, getCleanShadow(V));
2702
2703 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2704 return IRB.CreateIntCast(V, dstTy, Signed);
2705 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2706 cast<VectorType>(dstTy)->getElementCount() ==
2707 cast<VectorType>(srcTy)->getElementCount())
2708 return IRB.CreateIntCast(V, dstTy, Signed);
2709 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2710 Value *V2 =
2711 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2712 return IRB.CreateBitCast(V2, dstTy);
2713 // TODO: handle struct types.
2714 }
2715
2716 /// Cast an application value to the type of its own shadow.
2717 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2718 Type *ShadowTy = getShadowTy(V);
2719 if (V->getType() == ShadowTy)
2720 return V;
2721 if (V->getType()->isPtrOrPtrVectorTy())
2722 return IRB.CreatePtrToInt(V, ShadowTy);
2723 else
2724 return IRB.CreateBitCast(V, ShadowTy);
2725 }
2726
2727 /// Propagate shadow for arbitrary operation.
2728 void handleShadowOr(Instruction &I) {
2729 IRBuilder<> IRB(&I);
2730 ShadowAndOriginCombiner SC(this, IRB);
2731 for (Use &Op : I.operands())
2732 SC.Add(Op.get());
2733 SC.Done(&I);
2734 }
2735
2736 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2737 // of elements.
2738 //
2739 // For example, suppose we have:
2740 // VectorA: <a0, a1, a2, a3, a4, a5>
2741 // VectorB: <b0, b1, b2, b3, b4, b5>
2742 // ReductionFactor: 3
2743 // Shards: 1
2744 // The output would be:
2745 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2746 //
2747 // If we have:
2748 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2749 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2750 // ReductionFactor: 2
2751 // Shards: 2
2752 // then a and be each have 2 "shards", resulting in the output being
2753 // interleaved:
2754 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2755 //
2756 // This is convenient for instrumenting horizontal add/sub.
2757 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2758 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2759 unsigned Shards, Value *VectorA, Value *VectorB) {
2760 assert(isa<FixedVectorType>(VectorA->getType()));
2761 unsigned NumElems =
2762 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2763
2764 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2765 if (VectorB) {
2766 assert(VectorA->getType() == VectorB->getType());
2767 TotalNumElems *= 2;
2768 }
2769
2770 assert(NumElems % (ReductionFactor * Shards) == 0);
2771
2772 Value *Or = nullptr;
2773
2774 IRBuilder<> IRB(&I);
2775 for (unsigned i = 0; i < ReductionFactor; i++) {
2776 SmallVector<int, 16> Mask;
2777
2778 for (unsigned j = 0; j < Shards; j++) {
2779 unsigned Offset = NumElems / Shards * j;
2780
2781 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2782 Mask.push_back(Offset + X + i);
2783
2784 if (VectorB) {
2785 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2786 Mask.push_back(NumElems + Offset + X + i);
2787 }
2788 }
2789
2790 Value *Masked;
2791 if (VectorB)
2792 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2793 else
2794 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2795
2796 if (Or)
2797 Or = IRB.CreateOr(Or, Masked);
2798 else
2799 Or = Masked;
2800 }
2801
2802 return Or;
2803 }
2804
2805 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2806 /// fields.
2807 ///
2808 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2809 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2810 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2811 assert(I.arg_size() == 1 || I.arg_size() == 2);
2812
2813 assert(I.getType()->isVectorTy());
2814 assert(I.getArgOperand(0)->getType()->isVectorTy());
2815
2816 [[maybe_unused]] FixedVectorType *ParamType =
2817 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2818 assert((I.arg_size() != 2) ||
2819 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2820 [[maybe_unused]] FixedVectorType *ReturnType =
2821 cast<FixedVectorType>(I.getType());
2822 assert(ParamType->getNumElements() * I.arg_size() ==
2823 2 * ReturnType->getNumElements());
2824
2825 IRBuilder<> IRB(&I);
2826
2827 // Horizontal OR of shadow
2828 Value *FirstArgShadow = getShadow(&I, 0);
2829 Value *SecondArgShadow = nullptr;
2830 if (I.arg_size() == 2)
2831 SecondArgShadow = getShadow(&I, 1);
2832
2833 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2834 FirstArgShadow, SecondArgShadow);
2835
2836 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2837
2838 setShadow(&I, OrShadow);
2839 setOriginForNaryOp(I);
2840 }
2841
2842 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2843 /// fields, with the parameters reinterpreted to have elements of a specified
2844 /// width. For example:
2845 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2846 /// conceptually operates on
2847 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2848 /// and can be handled with ReinterpretElemWidth == 16.
2849 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2850 int ReinterpretElemWidth) {
2851 assert(I.arg_size() == 1 || I.arg_size() == 2);
2852
2853 assert(I.getType()->isVectorTy());
2854 assert(I.getArgOperand(0)->getType()->isVectorTy());
2855
2856 FixedVectorType *ParamType =
2857 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2858 assert((I.arg_size() != 2) ||
2859 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2860
2861 [[maybe_unused]] FixedVectorType *ReturnType =
2862 cast<FixedVectorType>(I.getType());
2863 assert(ParamType->getNumElements() * I.arg_size() ==
2864 2 * ReturnType->getNumElements());
2865
2866 IRBuilder<> IRB(&I);
2867
2868 FixedVectorType *ReinterpretShadowTy = nullptr;
2869 assert(isAligned(Align(ReinterpretElemWidth),
2870 ParamType->getPrimitiveSizeInBits()));
2871 ReinterpretShadowTy = FixedVectorType::get(
2872 IRB.getIntNTy(ReinterpretElemWidth),
2873 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2874
2875 // Horizontal OR of shadow
2876 Value *FirstArgShadow = getShadow(&I, 0);
2877 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2878
2879 // If we had two parameters each with an odd number of elements, the total
2880 // number of elements is even, but we have never seen this in extant
2881 // instruction sets, so we enforce that each parameter must have an even
2882 // number of elements.
2884 Align(2),
2885 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2886
2887 Value *SecondArgShadow = nullptr;
2888 if (I.arg_size() == 2) {
2889 SecondArgShadow = getShadow(&I, 1);
2890 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2891 }
2892
2893 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2894 FirstArgShadow, SecondArgShadow);
2895
2896 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2897
2898 setShadow(&I, OrShadow);
2899 setOriginForNaryOp(I);
2900 }
2901
2902 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2903
2904 // Handle multiplication by constant.
2905 //
2906 // Handle a special case of multiplication by constant that may have one or
2907 // more zeros in the lower bits. This makes corresponding number of lower bits
2908 // of the result zero as well. We model it by shifting the other operand
2909 // shadow left by the required number of bits. Effectively, we transform
2910 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2911 // We use multiplication by 2**N instead of shift to cover the case of
2912 // multiplication by 0, which may occur in some elements of a vector operand.
2913 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2914 Value *OtherArg) {
2915 Constant *ShadowMul;
2916 Type *Ty = ConstArg->getType();
2917 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2918 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2919 Type *EltTy = VTy->getElementType();
2921 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2922 if (ConstantInt *Elt =
2924 const APInt &V = Elt->getValue();
2925 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2926 Elements.push_back(ConstantInt::get(EltTy, V2));
2927 } else {
2928 Elements.push_back(ConstantInt::get(EltTy, 1));
2929 }
2930 }
2931 ShadowMul = ConstantVector::get(Elements);
2932 } else {
2933 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2934 const APInt &V = Elt->getValue();
2935 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2936 ShadowMul = ConstantInt::get(Ty, V2);
2937 } else {
2938 ShadowMul = ConstantInt::get(Ty, 1);
2939 }
2940 }
2941
2942 IRBuilder<> IRB(&I);
2943 setShadow(&I,
2944 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2945 setOrigin(&I, getOrigin(OtherArg));
2946 }
2947
2948 void visitMul(BinaryOperator &I) {
2949 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2950 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2951 if (constOp0 && !constOp1)
2952 handleMulByConstant(I, constOp0, I.getOperand(1));
2953 else if (constOp1 && !constOp0)
2954 handleMulByConstant(I, constOp1, I.getOperand(0));
2955 else
2956 handleShadowOr(I);
2957 }
2958
2959 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2960 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2961 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2962 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2963 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2964 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2965
2966 void handleIntegerDiv(Instruction &I) {
2967 IRBuilder<> IRB(&I);
2968 // Strict on the second argument.
2969 insertCheckShadowOf(I.getOperand(1), &I);
2970 setShadow(&I, getShadow(&I, 0));
2971 setOrigin(&I, getOrigin(&I, 0));
2972 }
2973
2974 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2975 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2976 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2977 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2978
2979 // Floating point division is side-effect free. We can not require that the
2980 // divisor is fully initialized and must propagate shadow. See PR37523.
2981 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2982 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2983
2984 /// Instrument == and != comparisons.
2985 ///
2986 /// Sometimes the comparison result is known even if some of the bits of the
2987 /// arguments are not.
2988 void handleEqualityComparison(ICmpInst &I) {
2989 IRBuilder<> IRB(&I);
2990 Value *A = I.getOperand(0);
2991 Value *B = I.getOperand(1);
2992 Value *Sa = getShadow(A);
2993 Value *Sb = getShadow(B);
2994
2995 // Get rid of pointers and vectors of pointers.
2996 // For ints (and vectors of ints), types of A and Sa match,
2997 // and this is a no-op.
2998 A = IRB.CreatePointerCast(A, Sa->getType());
2999 B = IRB.CreatePointerCast(B, Sb->getType());
3000
3001 // A == B <==> (C = A^B) == 0
3002 // A != B <==> (C = A^B) != 0
3003 // Sc = Sa | Sb
3004 Value *C = IRB.CreateXor(A, B);
3005 Value *Sc = IRB.CreateOr(Sa, Sb);
3006 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
3007 // Result is defined if one of the following is true
3008 // * there is a defined 1 bit in C
3009 // * C is fully defined
3010 // Si = !(C & ~Sc) && Sc
3012 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
3013 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
3014 Value *RHS =
3015 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
3016 Value *Si = IRB.CreateAnd(LHS, RHS);
3017 Si->setName("_msprop_icmp");
3018 setShadow(&I, Si);
3019 setOriginForNaryOp(I);
3020 }
3021
3022 /// Instrument relational comparisons.
3023 ///
3024 /// This function does exact shadow propagation for all relational
3025 /// comparisons of integers, pointers and vectors of those.
3026 /// FIXME: output seems suboptimal when one of the operands is a constant
3027 void handleRelationalComparisonExact(ICmpInst &I) {
3028 IRBuilder<> IRB(&I);
3029 Value *A = I.getOperand(0);
3030 Value *B = I.getOperand(1);
3031 Value *Sa = getShadow(A);
3032 Value *Sb = getShadow(B);
3033
3034 // Get rid of pointers and vectors of pointers.
3035 // For ints (and vectors of ints), types of A and Sa match,
3036 // and this is a no-op.
3037 A = IRB.CreatePointerCast(A, Sa->getType());
3038 B = IRB.CreatePointerCast(B, Sb->getType());
3039
3040 // Let [a0, a1] be the interval of possible values of A, taking into account
3041 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3042 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3043 bool IsSigned = I.isSigned();
3044
3045 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3046 if (IsSigned) {
3047 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3048 // should be preserved, if checked with `getUnsignedPredicate()`.
3049 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3050 // affected, as they are created by effectively adding/substructing from
3051 // A (or B) a value, derived from shadow, with no overflow, either
3052 // before or after sign flip.
3053 APInt MinVal =
3054 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3055 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3056 }
3057 // Minimize undefined bits.
3058 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3059 Value *Max = IRB.CreateOr(V, S);
3060 return std::make_pair(Min, Max);
3061 };
3062
3063 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3064 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3065 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3066 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3067
3068 Value *Si = IRB.CreateXor(S1, S2);
3069 setShadow(&I, Si);
3070 setOriginForNaryOp(I);
3071 }
3072
3073 /// Instrument signed relational comparisons.
3074 ///
3075 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3076 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3077 void handleSignedRelationalComparison(ICmpInst &I) {
3078 Constant *constOp;
3079 Value *op = nullptr;
3081 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3082 op = I.getOperand(0);
3083 pre = I.getPredicate();
3084 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3085 op = I.getOperand(1);
3086 pre = I.getSwappedPredicate();
3087 } else {
3088 handleShadowOr(I);
3089 return;
3090 }
3091
3092 if ((constOp->isNullValue() &&
3093 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3094 (constOp->isAllOnesValue() &&
3095 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3096 IRBuilder<> IRB(&I);
3097 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3098 "_msprop_icmp_s");
3099 setShadow(&I, Shadow);
3100 setOrigin(&I, getOrigin(op));
3101 } else {
3102 handleShadowOr(I);
3103 }
3104 }
3105
3106 void visitICmpInst(ICmpInst &I) {
3107 if (!ClHandleICmp) {
3108 handleShadowOr(I);
3109 return;
3110 }
3111 if (I.isEquality()) {
3112 handleEqualityComparison(I);
3113 return;
3114 }
3115
3116 assert(I.isRelational());
3117 if (ClHandleICmpExact) {
3118 handleRelationalComparisonExact(I);
3119 return;
3120 }
3121 if (I.isSigned()) {
3122 handleSignedRelationalComparison(I);
3123 return;
3124 }
3125
3126 assert(I.isUnsigned());
3127 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3128 handleRelationalComparisonExact(I);
3129 return;
3130 }
3131
3132 handleShadowOr(I);
3133 }
3134
3135 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3136
3137 void handleShift(BinaryOperator &I) {
3138 IRBuilder<> IRB(&I);
3139 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3140 // Otherwise perform the same shift on S1.
3141 Value *S1 = getShadow(&I, 0);
3142 Value *S2 = getShadow(&I, 1);
3143 Value *S2Conv =
3144 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3145 Value *V2 = I.getOperand(1);
3146 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3147 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3148 setOriginForNaryOp(I);
3149 }
3150
3151 void visitShl(BinaryOperator &I) { handleShift(I); }
3152 void visitAShr(BinaryOperator &I) { handleShift(I); }
3153 void visitLShr(BinaryOperator &I) { handleShift(I); }
3154
3155 void handleFunnelShift(IntrinsicInst &I) {
3156 IRBuilder<> IRB(&I);
3157 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3158 // Otherwise perform the same shift on S0 and S1.
3159 Value *S0 = getShadow(&I, 0);
3160 Value *S1 = getShadow(&I, 1);
3161 Value *S2 = getShadow(&I, 2);
3162 Value *S2Conv =
3163 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3164 Value *V2 = I.getOperand(2);
3165 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3166 {S0, S1, V2});
3167 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3168 setOriginForNaryOp(I);
3169 }
3170
3171 /// Instrument llvm.memmove
3172 ///
3173 /// At this point we don't know if llvm.memmove will be inlined or not.
3174 /// If we don't instrument it and it gets inlined,
3175 /// our interceptor will not kick in and we will lose the memmove.
3176 /// If we instrument the call here, but it does not get inlined,
3177 /// we will memmove the shadow twice: which is bad in case
3178 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3179 ///
3180 /// Similar situation exists for memcpy and memset.
3181 void visitMemMoveInst(MemMoveInst &I) {
3182 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3183 IRBuilder<> IRB(&I);
3184 IRB.CreateCall(MS.MemmoveFn,
3185 {I.getArgOperand(0), I.getArgOperand(1),
3186 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3188 }
3189
3190 /// Instrument memcpy
3191 ///
3192 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3193 /// unfortunate as it may slowdown small constant memcpys.
3194 /// FIXME: consider doing manual inline for small constant sizes and proper
3195 /// alignment.
3196 ///
3197 /// Note: This also handles memcpy.inline, which promises no calls to external
3198 /// functions as an optimization. However, with instrumentation enabled this
3199 /// is difficult to promise; additionally, we know that the MSan runtime
3200 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3201 /// instrumentation it's safe to turn memcpy.inline into a call to
3202 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3203 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3204 void visitMemCpyInst(MemCpyInst &I) {
3205 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3206 IRBuilder<> IRB(&I);
3207 IRB.CreateCall(MS.MemcpyFn,
3208 {I.getArgOperand(0), I.getArgOperand(1),
3209 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3211 }
3212
3213 // Same as memcpy.
3214 void visitMemSetInst(MemSetInst &I) {
3215 IRBuilder<> IRB(&I);
3216 IRB.CreateCall(
3217 MS.MemsetFn,
3218 {I.getArgOperand(0),
3219 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3220 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3222 }
3223
3224 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3225
3226 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3227
3228 /// Handle vector store-like intrinsics.
3229 ///
3230 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3231 /// has 1 pointer argument and 1 vector argument, returns void.
3232 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3233 assert(I.arg_size() == 2);
3234
3235 IRBuilder<> IRB(&I);
3236 Value *Addr = I.getArgOperand(0);
3237 Value *Shadow = getShadow(&I, 1);
3238 Value *ShadowPtr, *OriginPtr;
3239
3240 // We don't know the pointer alignment (could be unaligned SSE store!).
3241 // Have to assume to worst case.
3242 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3243 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3244 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3245
3247 insertCheckShadowOf(Addr, &I);
3248
3249 // FIXME: factor out common code from materializeStores
3250 if (MS.TrackOrigins)
3251 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3252 return true;
3253 }
3254
3255 /// Handle vector load-like intrinsics.
3256 ///
3257 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3258 /// has 1 pointer argument, returns a vector.
3259 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3260 assert(I.arg_size() == 1);
3261
3262 IRBuilder<> IRB(&I);
3263 Value *Addr = I.getArgOperand(0);
3264
3265 Type *ShadowTy = getShadowTy(&I);
3266 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3267 if (PropagateShadow) {
3268 // We don't know the pointer alignment (could be unaligned SSE load!).
3269 // Have to assume to worst case.
3270 const Align Alignment = Align(1);
3271 std::tie(ShadowPtr, OriginPtr) =
3272 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3273 setShadow(&I,
3274 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3275 } else {
3276 setShadow(&I, getCleanShadow(&I));
3277 }
3278
3280 insertCheckShadowOf(Addr, &I);
3281
3282 if (MS.TrackOrigins) {
3283 if (PropagateShadow)
3284 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3285 else
3286 setOrigin(&I, getCleanOrigin());
3287 }
3288 return true;
3289 }
3290
3291 /// Handle (SIMD arithmetic)-like intrinsics.
3292 ///
3293 /// Instrument intrinsics with any number of arguments of the same type [*],
3294 /// equal to the return type, plus a specified number of trailing flags of
3295 /// any type.
3296 ///
3297 /// [*] The type should be simple (no aggregates or pointers; vectors are
3298 /// fine).
3299 ///
3300 /// Caller guarantees that this intrinsic does not access memory.
3301 ///
3302 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3303 /// by this handler. See horizontalReduce().
3304 ///
3305 /// TODO: permutation intrinsics are also often incorrectly matched.
3306 [[maybe_unused]] bool
3307 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3308 unsigned int trailingFlags) {
3309 Type *RetTy = I.getType();
3310 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3311 return false;
3312
3313 unsigned NumArgOperands = I.arg_size();
3314 assert(NumArgOperands >= trailingFlags);
3315 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3316 Type *Ty = I.getArgOperand(i)->getType();
3317 if (Ty != RetTy)
3318 return false;
3319 }
3320
3321 IRBuilder<> IRB(&I);
3322 ShadowAndOriginCombiner SC(this, IRB);
3323 for (unsigned i = 0; i < NumArgOperands; ++i)
3324 SC.Add(I.getArgOperand(i));
3325 SC.Done(&I);
3326
3327 return true;
3328 }
3329
3330 /// Returns whether it was able to heuristically instrument unknown
3331 /// intrinsics.
3332 ///
3333 /// The main purpose of this code is to do something reasonable with all
3334 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3335 /// We recognize several classes of intrinsics by their argument types and
3336 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3337 /// sure that we know what the intrinsic does.
3338 ///
3339 /// We special-case intrinsics where this approach fails. See llvm.bswap
3340 /// handling as an example of that.
3341 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3342 unsigned NumArgOperands = I.arg_size();
3343 if (NumArgOperands == 0)
3344 return false;
3345
3346 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3347 I.getArgOperand(1)->getType()->isVectorTy() &&
3348 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3349 // This looks like a vector store.
3350 return handleVectorStoreIntrinsic(I);
3351 }
3352
3353 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3354 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3355 // This looks like a vector load.
3356 return handleVectorLoadIntrinsic(I);
3357 }
3358
3359 if (I.doesNotAccessMemory())
3360 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3361 return true;
3362
3363 // FIXME: detect and handle SSE maskstore/maskload?
3364 // Some cases are now handled in handleAVXMasked{Load,Store}.
3365 return false;
3366 }
3367
3368 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3369 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3371 dumpInst(I);
3372
3373 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3374 << "\n");
3375 return true;
3376 } else
3377 return false;
3378 }
3379
3380 void handleInvariantGroup(IntrinsicInst &I) {
3381 setShadow(&I, getShadow(&I, 0));
3382 setOrigin(&I, getOrigin(&I, 0));
3383 }
3384
3385 void handleLifetimeStart(IntrinsicInst &I) {
3386 if (!PoisonStack)
3387 return;
3388 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3389 if (AI)
3390 LifetimeStartList.push_back(std::make_pair(&I, AI));
3391 }
3392
3393 void handleBswap(IntrinsicInst &I) {
3394 IRBuilder<> IRB(&I);
3395 Value *Op = I.getArgOperand(0);
3396 Type *OpType = Op->getType();
3397 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3398 getShadow(Op)));
3399 setOrigin(&I, getOrigin(Op));
3400 }
3401
3402 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3403 // and a 1. If the input is all zero, it is fully initialized iff
3404 // !is_zero_poison.
3405 //
3406 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3407 // concrete value 0/1, and ? is an uninitialized bit:
3408 // - 0001 0??? is fully initialized
3409 // - 000? ???? is fully uninitialized (*)
3410 // - ???? ???? is fully uninitialized
3411 // - 0000 0000 is fully uninitialized if is_zero_poison,
3412 // fully initialized otherwise
3413 //
3414 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3415 // only need to poison 4 bits.
3416 //
3417 // OutputShadow =
3418 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3419 // || (is_zero_poison && AllZeroSrc)
3420 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3421 IRBuilder<> IRB(&I);
3422 Value *Src = I.getArgOperand(0);
3423 Value *SrcShadow = getShadow(Src);
3424
3425 Value *False = IRB.getInt1(false);
3426 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3427 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3428 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3429 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3430
3431 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3432 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3433
3434 Value *NotAllZeroShadow =
3435 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3436 Value *OutputShadow =
3437 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3438
3439 // If zero poison is requested, mix in with the shadow
3440 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3441 if (!IsZeroPoison->isZeroValue()) {
3442 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3443 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3444 }
3445
3446 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3447
3448 setShadow(&I, OutputShadow);
3449 setOriginForNaryOp(I);
3450 }
3451
3452 /// Handle Arm NEON vector convert intrinsics.
3453 ///
3454 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3455 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
3456 ///
3457 /// For conversions to or from fixed-point, there is a trailing argument to
3458 /// indicate the fixed-point precision:
3459 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
3460 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
3461 ///
3462 /// For x86 SSE vector convert intrinsics, see
3463 /// handleSSEVectorConvertIntrinsic().
3464 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I, bool FixedPoint) {
3465 if (FixedPoint)
3466 assert(I.arg_size() == 2);
3467 else
3468 assert(I.arg_size() == 1);
3469
3470 IRBuilder<> IRB(&I);
3471 Value *S0 = getShadow(&I, 0);
3472
3473 if (FixedPoint) {
3474 Value *Precision = I.getOperand(1);
3475 insertCheckShadowOf(Precision, &I);
3476 }
3477
3478 /// For scalars:
3479 /// Since they are converting from floating-point to integer, the output is
3480 /// - fully uninitialized if *any* bit of the input is uninitialized
3481 /// - fully ininitialized if all bits of the input are ininitialized
3482 /// We apply the same principle on a per-field basis for vectors.
3483 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3484 getShadowTy(&I));
3485 setShadow(&I, OutShadow);
3486 setOriginForNaryOp(I);
3487 }
3488
3489 /// Some instructions have additional zero-elements in the return type
3490 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3491 ///
3492 /// This function will return a vector type with the same number of elements
3493 /// as the input, but same per-element width as the return value e.g.,
3494 /// <8 x i8>.
3495 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3496 assert(isa<FixedVectorType>(getShadowTy(&I)));
3497 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3498
3499 // TODO: generalize beyond 2x?
3500 if (ShadowType->getElementCount() ==
3501 cast<VectorType>(Src->getType())->getElementCount() * 2)
3502 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3503
3504 assert(ShadowType->getElementCount() ==
3505 cast<VectorType>(Src->getType())->getElementCount());
3506
3507 return ShadowType;
3508 }
3509
3510 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3511 /// to match the length of the shadow for the instruction.
3512 /// If scalar types of the vectors are different, it will use the type of the
3513 /// input vector.
3514 /// This is more type-safe than CreateShadowCast().
3515 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3516 IRBuilder<> IRB(&I);
3518 assert(isa<FixedVectorType>(I.getType()));
3519
3520 Value *FullShadow = getCleanShadow(&I);
3521 unsigned ShadowNumElems =
3522 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3523 unsigned FullShadowNumElems =
3524 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3525
3526 assert((ShadowNumElems == FullShadowNumElems) ||
3527 (ShadowNumElems * 2 == FullShadowNumElems));
3528
3529 if (ShadowNumElems == FullShadowNumElems) {
3530 FullShadow = Shadow;
3531 } else {
3532 // TODO: generalize beyond 2x?
3533 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3534 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3535
3536 // Append zeros
3537 FullShadow =
3538 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3539 }
3540
3541 return FullShadow;
3542 }
3543
3544 /// Handle x86 SSE vector conversion.
3545 ///
3546 /// e.g., single-precision to half-precision conversion:
3547 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3548 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3549 ///
3550 /// floating-point to integer:
3551 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3552 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3553 ///
3554 /// Note: if the output has more elements, they are zero-initialized (and
3555 /// therefore the shadow will also be initialized).
3556 ///
3557 /// This differs from handleSSEVectorConvertIntrinsic() because it
3558 /// propagates uninitialized shadow (instead of checking the shadow).
3559 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3560 bool HasRoundingMode) {
3561 if (HasRoundingMode) {
3562 assert(I.arg_size() == 2);
3563 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3564 assert(RoundingMode->getType()->isIntegerTy());
3565 } else {
3566 assert(I.arg_size() == 1);
3567 }
3568
3569 Value *Src = I.getArgOperand(0);
3570 assert(Src->getType()->isVectorTy());
3571
3572 // The return type might have more elements than the input.
3573 // Temporarily shrink the return type's number of elements.
3574 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3575
3576 IRBuilder<> IRB(&I);
3577 Value *S0 = getShadow(&I, 0);
3578
3579 /// For scalars:
3580 /// Since they are converting to and/or from floating-point, the output is:
3581 /// - fully uninitialized if *any* bit of the input is uninitialized
3582 /// - fully ininitialized if all bits of the input are ininitialized
3583 /// We apply the same principle on a per-field basis for vectors.
3584 Value *Shadow =
3585 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3586
3587 // The return type might have more elements than the input.
3588 // Extend the return type back to its original width if necessary.
3589 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3590
3591 setShadow(&I, FullShadow);
3592 setOriginForNaryOp(I);
3593 }
3594
3595 // Instrument x86 SSE vector convert intrinsic.
3596 //
3597 // This function instruments intrinsics like cvtsi2ss:
3598 // %Out = int_xxx_cvtyyy(%ConvertOp)
3599 // or
3600 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3601 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3602 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3603 // elements from \p CopyOp.
3604 // In most cases conversion involves floating-point value which may trigger a
3605 // hardware exception when not fully initialized. For this reason we require
3606 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3607 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3608 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3609 // return a fully initialized value.
3610 //
3611 // For Arm NEON vector convert intrinsics, see
3612 // handleNEONVectorConvertIntrinsic().
3613 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3614 bool HasRoundingMode = false) {
3615 IRBuilder<> IRB(&I);
3616 Value *CopyOp, *ConvertOp;
3617
3618 assert((!HasRoundingMode ||
3619 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3620 "Invalid rounding mode");
3621
3622 switch (I.arg_size() - HasRoundingMode) {
3623 case 2:
3624 CopyOp = I.getArgOperand(0);
3625 ConvertOp = I.getArgOperand(1);
3626 break;
3627 case 1:
3628 ConvertOp = I.getArgOperand(0);
3629 CopyOp = nullptr;
3630 break;
3631 default:
3632 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3633 }
3634
3635 // The first *NumUsedElements* elements of ConvertOp are converted to the
3636 // same number of output elements. The rest of the output is copied from
3637 // CopyOp, or (if not available) filled with zeroes.
3638 // Combine shadow for elements of ConvertOp that are used in this operation,
3639 // and insert a check.
3640 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3641 // int->any conversion.
3642 Value *ConvertShadow = getShadow(ConvertOp);
3643 Value *AggShadow = nullptr;
3644 if (ConvertOp->getType()->isVectorTy()) {
3645 AggShadow = IRB.CreateExtractElement(
3646 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3647 for (int i = 1; i < NumUsedElements; ++i) {
3648 Value *MoreShadow = IRB.CreateExtractElement(
3649 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3650 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3651 }
3652 } else {
3653 AggShadow = ConvertShadow;
3654 }
3655 assert(AggShadow->getType()->isIntegerTy());
3656 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3657
3658 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3659 // ConvertOp.
3660 if (CopyOp) {
3661 assert(CopyOp->getType() == I.getType());
3662 assert(CopyOp->getType()->isVectorTy());
3663 Value *ResultShadow = getShadow(CopyOp);
3664 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3665 for (int i = 0; i < NumUsedElements; ++i) {
3666 ResultShadow = IRB.CreateInsertElement(
3667 ResultShadow, ConstantInt::getNullValue(EltTy),
3668 ConstantInt::get(IRB.getInt32Ty(), i));
3669 }
3670 setShadow(&I, ResultShadow);
3671 setOrigin(&I, getOrigin(CopyOp));
3672 } else {
3673 setShadow(&I, getCleanShadow(&I));
3674 setOrigin(&I, getCleanOrigin());
3675 }
3676 }
3677
3678 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3679 // zeroes if it is zero, and all ones otherwise.
3680 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3681 if (S->getType()->isVectorTy())
3682 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3683 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3684 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3685 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3686 }
3687
3688 // Given a vector, extract its first element, and return all
3689 // zeroes if it is zero, and all ones otherwise.
3690 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3691 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3692 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3693 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3694 }
3695
3696 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3697 Type *T = S->getType();
3698 assert(T->isVectorTy());
3699 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3700 return IRB.CreateSExt(S2, T);
3701 }
3702
3703 // Instrument vector shift intrinsic.
3704 //
3705 // This function instruments intrinsics like int_x86_avx2_psll_w.
3706 // Intrinsic shifts %In by %ShiftSize bits.
3707 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3708 // size, and the rest is ignored. Behavior is defined even if shift size is
3709 // greater than register (or field) width.
3710 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3711 assert(I.arg_size() == 2);
3712 IRBuilder<> IRB(&I);
3713 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3714 // Otherwise perform the same shift on S1.
3715 Value *S1 = getShadow(&I, 0);
3716 Value *S2 = getShadow(&I, 1);
3717 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3718 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3719 Value *V1 = I.getOperand(0);
3720 Value *V2 = I.getOperand(1);
3721 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3722 {IRB.CreateBitCast(S1, V1->getType()), V2});
3723 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3724 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3725 setOriginForNaryOp(I);
3726 }
3727
3728 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3729 // vectors.
3730 Type *getMMXVectorTy(unsigned EltSizeInBits,
3731 unsigned X86_MMXSizeInBits = 64) {
3732 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3733 "Illegal MMX vector element size");
3734 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3735 X86_MMXSizeInBits / EltSizeInBits);
3736 }
3737
3738 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3739 // intrinsic.
3740 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3741 switch (id) {
3742 case Intrinsic::x86_sse2_packsswb_128:
3743 case Intrinsic::x86_sse2_packuswb_128:
3744 return Intrinsic::x86_sse2_packsswb_128;
3745
3746 case Intrinsic::x86_sse2_packssdw_128:
3747 case Intrinsic::x86_sse41_packusdw:
3748 return Intrinsic::x86_sse2_packssdw_128;
3749
3750 case Intrinsic::x86_avx2_packsswb:
3751 case Intrinsic::x86_avx2_packuswb:
3752 return Intrinsic::x86_avx2_packsswb;
3753
3754 case Intrinsic::x86_avx2_packssdw:
3755 case Intrinsic::x86_avx2_packusdw:
3756 return Intrinsic::x86_avx2_packssdw;
3757
3758 case Intrinsic::x86_mmx_packsswb:
3759 case Intrinsic::x86_mmx_packuswb:
3760 return Intrinsic::x86_mmx_packsswb;
3761
3762 case Intrinsic::x86_mmx_packssdw:
3763 return Intrinsic::x86_mmx_packssdw;
3764
3765 case Intrinsic::x86_avx512_packssdw_512:
3766 case Intrinsic::x86_avx512_packusdw_512:
3767 return Intrinsic::x86_avx512_packssdw_512;
3768
3769 case Intrinsic::x86_avx512_packsswb_512:
3770 case Intrinsic::x86_avx512_packuswb_512:
3771 return Intrinsic::x86_avx512_packsswb_512;
3772
3773 default:
3774 llvm_unreachable("unexpected intrinsic id");
3775 }
3776 }
3777
3778 // Instrument vector pack intrinsic.
3779 //
3780 // This function instruments intrinsics like x86_mmx_packsswb, that
3781 // packs elements of 2 input vectors into half as many bits with saturation.
3782 // Shadow is propagated with the signed variant of the same intrinsic applied
3783 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3784 // MMXEltSizeInBits is used only for x86mmx arguments.
3785 //
3786 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3787 void handleVectorPackIntrinsic(IntrinsicInst &I,
3788 unsigned MMXEltSizeInBits = 0) {
3789 assert(I.arg_size() == 2);
3790 IRBuilder<> IRB(&I);
3791 Value *S1 = getShadow(&I, 0);
3792 Value *S2 = getShadow(&I, 1);
3793 assert(S1->getType()->isVectorTy());
3794
3795 // SExt and ICmpNE below must apply to individual elements of input vectors.
3796 // In case of x86mmx arguments, cast them to appropriate vector types and
3797 // back.
3798 Type *T =
3799 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3800 if (MMXEltSizeInBits) {
3801 S1 = IRB.CreateBitCast(S1, T);
3802 S2 = IRB.CreateBitCast(S2, T);
3803 }
3804 Value *S1_ext =
3806 Value *S2_ext =
3808 if (MMXEltSizeInBits) {
3809 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3810 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3811 }
3812
3813 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3814 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3815 "_msprop_vector_pack");
3816 if (MMXEltSizeInBits)
3817 S = IRB.CreateBitCast(S, getShadowTy(&I));
3818 setShadow(&I, S);
3819 setOriginForNaryOp(I);
3820 }
3821
3822 // Convert `Mask` into `<n x i1>`.
3823 Constant *createDppMask(unsigned Width, unsigned Mask) {
3824 SmallVector<Constant *, 4> R(Width);
3825 for (auto &M : R) {
3826 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3827 Mask >>= 1;
3828 }
3829 return ConstantVector::get(R);
3830 }
3831
3832 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3833 // arg is poisoned, entire dot product is poisoned.
3834 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3835 unsigned DstMask) {
3836 const unsigned Width =
3837 cast<FixedVectorType>(S->getType())->getNumElements();
3838
3839 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3841 Value *SElem = IRB.CreateOrReduce(S);
3842 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3843 Value *DstMaskV = createDppMask(Width, DstMask);
3844
3845 return IRB.CreateSelect(
3846 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3847 }
3848
3849 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3850 //
3851 // 2 and 4 element versions produce single scalar of dot product, and then
3852 // puts it into elements of output vector, selected by 4 lowest bits of the
3853 // mask. Top 4 bits of the mask control which elements of input to use for dot
3854 // product.
3855 //
3856 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3857 // mask. According to the spec it just operates as 4 element version on first
3858 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3859 // output.
3860 void handleDppIntrinsic(IntrinsicInst &I) {
3861 IRBuilder<> IRB(&I);
3862
3863 Value *S0 = getShadow(&I, 0);
3864 Value *S1 = getShadow(&I, 1);
3865 Value *S = IRB.CreateOr(S0, S1);
3866
3867 const unsigned Width =
3868 cast<FixedVectorType>(S->getType())->getNumElements();
3869 assert(Width == 2 || Width == 4 || Width == 8);
3870
3871 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3872 const unsigned SrcMask = Mask >> 4;
3873 const unsigned DstMask = Mask & 0xf;
3874
3875 // Calculate shadow as `<n x i1>`.
3876 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3877 if (Width == 8) {
3878 // First 4 elements of shadow are already calculated. `makeDppShadow`
3879 // operats on 32 bit masks, so we can just shift masks, and repeat.
3880 SI1 = IRB.CreateOr(
3881 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3882 }
3883 // Extend to real size of shadow, poisoning either all or none bits of an
3884 // element.
3885 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3886
3887 setShadow(&I, S);
3888 setOriginForNaryOp(I);
3889 }
3890
3891 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3892 C = CreateAppToShadowCast(IRB, C);
3893 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3894 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3895 C = IRB.CreateAShr(C, ElSize - 1);
3896 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3897 return IRB.CreateTrunc(C, FVT);
3898 }
3899
3900 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3901 void handleBlendvIntrinsic(IntrinsicInst &I) {
3902 Value *C = I.getOperand(2);
3903 Value *T = I.getOperand(1);
3904 Value *F = I.getOperand(0);
3905
3906 Value *Sc = getShadow(&I, 2);
3907 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3908
3909 {
3910 IRBuilder<> IRB(&I);
3911 // Extract top bit from condition and its shadow.
3912 C = convertBlendvToSelectMask(IRB, C);
3913 Sc = convertBlendvToSelectMask(IRB, Sc);
3914
3915 setShadow(C, Sc);
3916 setOrigin(C, Oc);
3917 }
3918
3919 handleSelectLikeInst(I, C, T, F);
3920 }
3921
3922 // Instrument sum-of-absolute-differences intrinsic.
3923 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3924 const unsigned SignificantBitsPerResultElement = 16;
3925 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3926 unsigned ZeroBitsPerResultElement =
3927 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3928
3929 IRBuilder<> IRB(&I);
3930 auto *Shadow0 = getShadow(&I, 0);
3931 auto *Shadow1 = getShadow(&I, 1);
3932 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3933 S = IRB.CreateBitCast(S, ResTy);
3934 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3935 ResTy);
3936 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3937 S = IRB.CreateBitCast(S, getShadowTy(&I));
3938 setShadow(&I, S);
3939 setOriginForNaryOp(I);
3940 }
3941
3942 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
3943 //
3944 // e.g., Two operands:
3945 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3946 //
3947 // Two operands which require an EltSizeInBits override:
3948 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3949 //
3950 // Three operands:
3951 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3952 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3953 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
3954 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
3955 // (these are equivalent to multiply-add on %a and %b, followed by
3956 // adding/"accumulating" %s. "Accumulation" stores the result in one
3957 // of the source registers, but this accumulate vs. add distinction
3958 // is lost when dealing with LLVM intrinsics.)
3959 //
3960 // ZeroPurifies means that multiplying a known-zero with an uninitialized
3961 // value results in an initialized value. This is applicable for integer
3962 // multiplication, but not floating-point (counter-example: NaN).
3963 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
3964 unsigned ReductionFactor,
3965 bool ZeroPurifies,
3966 unsigned EltSizeInBits,
3967 enum OddOrEvenLanes Lanes) {
3968 IRBuilder<> IRB(&I);
3969
3970 [[maybe_unused]] FixedVectorType *ReturnType =
3971 cast<FixedVectorType>(I.getType());
3972 assert(isa<FixedVectorType>(ReturnType));
3973
3974 // Vectors A and B, and shadows
3975 Value *Va = nullptr;
3976 Value *Vb = nullptr;
3977 Value *Sa = nullptr;
3978 Value *Sb = nullptr;
3979
3980 assert(I.arg_size() == 2 || I.arg_size() == 3);
3981 if (I.arg_size() == 2) {
3982 assert(Lanes == kBothLanes);
3983
3984 Va = I.getOperand(0);
3985 Vb = I.getOperand(1);
3986
3987 Sa = getShadow(&I, 0);
3988 Sb = getShadow(&I, 1);
3989 } else if (I.arg_size() == 3) {
3990 // Operand 0 is the accumulator. We will deal with that below.
3991 Va = I.getOperand(1);
3992 Vb = I.getOperand(2);
3993
3994 Sa = getShadow(&I, 1);
3995 Sb = getShadow(&I, 2);
3996
3997 if (Lanes == kEvenLanes || Lanes == kOddLanes) {
3998 // Convert < S0, S1, S2, S3, S4, S5, S6, S7 >
3999 // to < S0, S0, S2, S2, S4, S4, S6, S6 > (if even)
4000 // to < S1, S1, S3, S3, S5, S5, S7, S7 > (if odd)
4001 //
4002 // Note: for aarch64.neon.bfmlalb/t, the odd/even-indexed values are
4003 // zeroed, not duplicated. However, for shadow propagation, this
4004 // distinction is unimportant because Step 1 below will squeeze
4005 // each pair of elements (e.g., [S0, S0]) into a single bit, and
4006 // we only care if it is fully initialized.
4007
4008 FixedVectorType *InputShadowType = cast<FixedVectorType>(Sa->getType());
4009 unsigned Width = InputShadowType->getNumElements();
4010
4011 Sa = IRB.CreateShuffleVector(
4012 Sa, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4013 Sb = IRB.CreateShuffleVector(
4014 Sb, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4015 }
4016 }
4017
4018 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
4019 assert(ParamType == Vb->getType());
4020
4021 assert(ParamType->getPrimitiveSizeInBits() ==
4022 ReturnType->getPrimitiveSizeInBits());
4023
4024 if (I.arg_size() == 3) {
4025 [[maybe_unused]] auto *AccumulatorType =
4026 cast<FixedVectorType>(I.getOperand(0)->getType());
4027 assert(AccumulatorType == ReturnType);
4028 }
4029
4030 FixedVectorType *ImplicitReturnType =
4031 cast<FixedVectorType>(getShadowTy(ReturnType));
4032 // Step 1: instrument multiplication of corresponding vector elements
4033 if (EltSizeInBits) {
4034 ImplicitReturnType = cast<FixedVectorType>(
4035 getMMXVectorTy(EltSizeInBits * ReductionFactor,
4036 ParamType->getPrimitiveSizeInBits()));
4037 ParamType = cast<FixedVectorType>(
4038 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
4039
4040 Va = IRB.CreateBitCast(Va, ParamType);
4041 Vb = IRB.CreateBitCast(Vb, ParamType);
4042
4043 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
4044 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
4045 } else {
4046 assert(ParamType->getNumElements() ==
4047 ReturnType->getNumElements() * ReductionFactor);
4048 }
4049
4050 // Each element of the vector is represented by a single bit (poisoned or
4051 // not) e.g., <8 x i1>.
4052 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
4053 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
4054 Value *And;
4055 if (ZeroPurifies) {
4056 // Multiplying an *initialized* zero by an uninitialized element results
4057 // in an initialized zero element.
4058 //
4059 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4060 // results in an unpoisoned value.
4061 Value *VaInt = Va;
4062 Value *VbInt = Vb;
4063 if (!Va->getType()->isIntegerTy()) {
4064 VaInt = CreateAppToShadowCast(IRB, Va);
4065 VbInt = CreateAppToShadowCast(IRB, Vb);
4066 }
4067
4068 // We check for non-zero on a per-element basis, not per-bit.
4069 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4070 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4071
4072 And = handleBitwiseAnd(IRB, VaNonZero, VbNonZero, SaNonZero, SbNonZero);
4073 } else {
4074 And = IRB.CreateOr({SaNonZero, SbNonZero});
4075 }
4076
4077 // Extend <8 x i1> to <8 x i16>.
4078 // (The real pmadd intrinsic would have computed intermediate values of
4079 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4080 // consider each element to be either fully initialized or fully
4081 // uninitialized.)
4082 And = IRB.CreateSExt(And, Sa->getType());
4083
4084 // Step 2: instrument horizontal add
4085 // We don't need bit-precise horizontalReduce because we only want to check
4086 // if each pair/quad of elements is fully zero.
4087 // Cast to <4 x i32>.
4088 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4089
4090 // Compute <4 x i1>, then extend back to <4 x i32>.
4091 Value *OutShadow = IRB.CreateSExt(
4092 IRB.CreateICmpNE(Horizontal,
4093 Constant::getNullValue(Horizontal->getType())),
4094 ImplicitReturnType);
4095
4096 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4097 // AVX, it is already correct).
4098 if (EltSizeInBits)
4099 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4100
4101 // Step 3 (if applicable): instrument accumulator
4102 if (I.arg_size() == 3)
4103 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4104
4105 setShadow(&I, OutShadow);
4106 setOriginForNaryOp(I);
4107 }
4108
4109 // Instrument compare-packed intrinsic.
4110 //
4111 // x86 has the predicate as the third operand, which is ImmArg e.g.,
4112 // - <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double>, <4 x double>, i8)
4113 // - <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double>, <2 x double>, i8)
4114 //
4115 // while Arm has separate intrinsics for >= and > e.g.,
4116 // - <2 x i32> @llvm.aarch64.neon.facge.v2i32.v2f32
4117 // (<2 x float> %A, <2 x float>)
4118 // - <2 x i32> @llvm.aarch64.neon.facgt.v2i32.v2f32
4119 // (<2 x float> %A, <2 x float>)
4120 void handleVectorComparePackedIntrinsic(IntrinsicInst &I,
4121 bool PredicateAsOperand) {
4122 if (PredicateAsOperand) {
4123 assert(I.arg_size() == 3);
4124 assert(I.paramHasAttr(2, Attribute::ImmArg));
4125 } else
4126 assert(I.arg_size() == 2);
4127
4128 IRBuilder<> IRB(&I);
4129
4130 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4131 // all-ones shadow.
4132 Type *ResTy = getShadowTy(&I);
4133 auto *Shadow0 = getShadow(&I, 0);
4134 auto *Shadow1 = getShadow(&I, 1);
4135 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4136 Value *S = IRB.CreateSExt(
4137 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4138 setShadow(&I, S);
4139 setOriginForNaryOp(I);
4140 }
4141
4142 // Instrument compare-scalar intrinsic.
4143 // This handles both cmp* intrinsics which return the result in the first
4144 // element of a vector, and comi* which return the result as i32.
4145 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4146 IRBuilder<> IRB(&I);
4147 auto *Shadow0 = getShadow(&I, 0);
4148 auto *Shadow1 = getShadow(&I, 1);
4149 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4150 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4151 setShadow(&I, S);
4152 setOriginForNaryOp(I);
4153 }
4154
4155 // Instrument generic vector reduction intrinsics
4156 // by ORing together all their fields.
4157 //
4158 // If AllowShadowCast is true, the return type does not need to be the same
4159 // type as the fields
4160 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4161 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4162 assert(I.arg_size() == 1);
4163
4164 IRBuilder<> IRB(&I);
4165 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4166 if (AllowShadowCast)
4167 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4168 else
4169 assert(S->getType() == getShadowTy(&I));
4170 setShadow(&I, S);
4171 setOriginForNaryOp(I);
4172 }
4173
4174 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4175 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4176 // %a1)
4177 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4178 //
4179 // The type of the return value, initial starting value, and elements of the
4180 // vector must be identical.
4181 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4182 assert(I.arg_size() == 2);
4183
4184 IRBuilder<> IRB(&I);
4185 Value *Shadow0 = getShadow(&I, 0);
4186 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4187 assert(Shadow0->getType() == Shadow1->getType());
4188 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4189 assert(S->getType() == getShadowTy(&I));
4190 setShadow(&I, S);
4191 setOriginForNaryOp(I);
4192 }
4193
4194 // Instrument vector.reduce.or intrinsic.
4195 // Valid (non-poisoned) set bits in the operand pull low the
4196 // corresponding shadow bits.
4197 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4198 assert(I.arg_size() == 1);
4199
4200 IRBuilder<> IRB(&I);
4201 Value *OperandShadow = getShadow(&I, 0);
4202 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4203 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4204 // Bit N is clean if any field's bit N is 1 and unpoison
4205 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4206 // Otherwise, it is clean if every field's bit N is unpoison
4207 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4208 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4209
4210 setShadow(&I, S);
4211 setOrigin(&I, getOrigin(&I, 0));
4212 }
4213
4214 // Instrument vector.reduce.and intrinsic.
4215 // Valid (non-poisoned) unset bits in the operand pull down the
4216 // corresponding shadow bits.
4217 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4218 assert(I.arg_size() == 1);
4219
4220 IRBuilder<> IRB(&I);
4221 Value *OperandShadow = getShadow(&I, 0);
4222 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4223 // Bit N is clean if any field's bit N is 0 and unpoison
4224 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4225 // Otherwise, it is clean if every field's bit N is unpoison
4226 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4227 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4228
4229 setShadow(&I, S);
4230 setOrigin(&I, getOrigin(&I, 0));
4231 }
4232
4233 void handleStmxcsr(IntrinsicInst &I) {
4234 IRBuilder<> IRB(&I);
4235 Value *Addr = I.getArgOperand(0);
4236 Type *Ty = IRB.getInt32Ty();
4237 Value *ShadowPtr =
4238 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4239
4240 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4241
4243 insertCheckShadowOf(Addr, &I);
4244 }
4245
4246 void handleLdmxcsr(IntrinsicInst &I) {
4247 if (!InsertChecks)
4248 return;
4249
4250 IRBuilder<> IRB(&I);
4251 Value *Addr = I.getArgOperand(0);
4252 Type *Ty = IRB.getInt32Ty();
4253 const Align Alignment = Align(1);
4254 Value *ShadowPtr, *OriginPtr;
4255 std::tie(ShadowPtr, OriginPtr) =
4256 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4257
4259 insertCheckShadowOf(Addr, &I);
4260
4261 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4262 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4263 : getCleanOrigin();
4264 insertCheckShadow(Shadow, Origin, &I);
4265 }
4266
4267 void handleMaskedExpandLoad(IntrinsicInst &I) {
4268 IRBuilder<> IRB(&I);
4269 Value *Ptr = I.getArgOperand(0);
4270 MaybeAlign Align = I.getParamAlign(0);
4271 Value *Mask = I.getArgOperand(1);
4272 Value *PassThru = I.getArgOperand(2);
4273
4275 insertCheckShadowOf(Ptr, &I);
4276 insertCheckShadowOf(Mask, &I);
4277 }
4278
4279 if (!PropagateShadow) {
4280 setShadow(&I, getCleanShadow(&I));
4281 setOrigin(&I, getCleanOrigin());
4282 return;
4283 }
4284
4285 Type *ShadowTy = getShadowTy(&I);
4286 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4287 auto [ShadowPtr, OriginPtr] =
4288 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4289
4290 Value *Shadow =
4291 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4292 getShadow(PassThru), "_msmaskedexpload");
4293
4294 setShadow(&I, Shadow);
4295
4296 // TODO: Store origins.
4297 setOrigin(&I, getCleanOrigin());
4298 }
4299
4300 void handleMaskedCompressStore(IntrinsicInst &I) {
4301 IRBuilder<> IRB(&I);
4302 Value *Values = I.getArgOperand(0);
4303 Value *Ptr = I.getArgOperand(1);
4304 MaybeAlign Align = I.getParamAlign(1);
4305 Value *Mask = I.getArgOperand(2);
4306
4308 insertCheckShadowOf(Ptr, &I);
4309 insertCheckShadowOf(Mask, &I);
4310 }
4311
4312 Value *Shadow = getShadow(Values);
4313 Type *ElementShadowTy =
4314 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4315 auto [ShadowPtr, OriginPtrs] =
4316 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4317
4318 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4319
4320 // TODO: Store origins.
4321 }
4322
4323 void handleMaskedGather(IntrinsicInst &I) {
4324 IRBuilder<> IRB(&I);
4325 Value *Ptrs = I.getArgOperand(0);
4326 const Align Alignment = I.getParamAlign(0).valueOrOne();
4327 Value *Mask = I.getArgOperand(1);
4328 Value *PassThru = I.getArgOperand(2);
4329
4330 Type *PtrsShadowTy = getShadowTy(Ptrs);
4332 insertCheckShadowOf(Mask, &I);
4333 Value *MaskedPtrShadow = IRB.CreateSelect(
4334 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4335 "_msmaskedptrs");
4336 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4337 }
4338
4339 if (!PropagateShadow) {
4340 setShadow(&I, getCleanShadow(&I));
4341 setOrigin(&I, getCleanOrigin());
4342 return;
4343 }
4344
4345 Type *ShadowTy = getShadowTy(&I);
4346 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4347 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4348 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4349
4350 Value *Shadow =
4351 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4352 getShadow(PassThru), "_msmaskedgather");
4353
4354 setShadow(&I, Shadow);
4355
4356 // TODO: Store origins.
4357 setOrigin(&I, getCleanOrigin());
4358 }
4359
4360 void handleMaskedScatter(IntrinsicInst &I) {
4361 IRBuilder<> IRB(&I);
4362 Value *Values = I.getArgOperand(0);
4363 Value *Ptrs = I.getArgOperand(1);
4364 const Align Alignment = I.getParamAlign(1).valueOrOne();
4365 Value *Mask = I.getArgOperand(2);
4366
4367 Type *PtrsShadowTy = getShadowTy(Ptrs);
4369 insertCheckShadowOf(Mask, &I);
4370 Value *MaskedPtrShadow = IRB.CreateSelect(
4371 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4372 "_msmaskedptrs");
4373 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4374 }
4375
4376 Value *Shadow = getShadow(Values);
4377 Type *ElementShadowTy =
4378 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4379 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4380 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4381
4382 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4383
4384 // TODO: Store origin.
4385 }
4386
4387 // Intrinsic::masked_store
4388 //
4389 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4390 // stores are lowered to Intrinsic::masked_store.
4391 void handleMaskedStore(IntrinsicInst &I) {
4392 IRBuilder<> IRB(&I);
4393 Value *V = I.getArgOperand(0);
4394 Value *Ptr = I.getArgOperand(1);
4395 const Align Alignment = I.getParamAlign(1).valueOrOne();
4396 Value *Mask = I.getArgOperand(2);
4397 Value *Shadow = getShadow(V);
4398
4400 insertCheckShadowOf(Ptr, &I);
4401 insertCheckShadowOf(Mask, &I);
4402 }
4403
4404 Value *ShadowPtr;
4405 Value *OriginPtr;
4406 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4407 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4408
4409 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4410
4411 if (!MS.TrackOrigins)
4412 return;
4413
4414 auto &DL = F.getDataLayout();
4415 paintOrigin(IRB, getOrigin(V), OriginPtr,
4416 DL.getTypeStoreSize(Shadow->getType()),
4417 std::max(Alignment, kMinOriginAlignment));
4418 }
4419
4420 // Intrinsic::masked_load
4421 //
4422 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4423 // loads are lowered to Intrinsic::masked_load.
4424 void handleMaskedLoad(IntrinsicInst &I) {
4425 IRBuilder<> IRB(&I);
4426 Value *Ptr = I.getArgOperand(0);
4427 const Align Alignment = I.getParamAlign(0).valueOrOne();
4428 Value *Mask = I.getArgOperand(1);
4429 Value *PassThru = I.getArgOperand(2);
4430
4432 insertCheckShadowOf(Ptr, &I);
4433 insertCheckShadowOf(Mask, &I);
4434 }
4435
4436 if (!PropagateShadow) {
4437 setShadow(&I, getCleanShadow(&I));
4438 setOrigin(&I, getCleanOrigin());
4439 return;
4440 }
4441
4442 Type *ShadowTy = getShadowTy(&I);
4443 Value *ShadowPtr, *OriginPtr;
4444 std::tie(ShadowPtr, OriginPtr) =
4445 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4446 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4447 getShadow(PassThru), "_msmaskedld"));
4448
4449 if (!MS.TrackOrigins)
4450 return;
4451
4452 // Choose between PassThru's and the loaded value's origins.
4453 Value *MaskedPassThruShadow = IRB.CreateAnd(
4454 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4455
4456 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4457
4458 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4459 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4460
4461 setOrigin(&I, Origin);
4462 }
4463
4464 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4465 // dst mask src
4466 //
4467 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4468 // by handleMaskedStore.
4469 //
4470 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4471 // vector of integers, unlike the LLVM masked intrinsics, which require a
4472 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4473 // mentions that the x86 backend does not know how to efficiently convert
4474 // from a vector of booleans back into the AVX mask format; therefore, they
4475 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4476 // intrinsics.
4477 void handleAVXMaskedStore(IntrinsicInst &I) {
4478 assert(I.arg_size() == 3);
4479
4480 IRBuilder<> IRB(&I);
4481
4482 Value *Dst = I.getArgOperand(0);
4483 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4484
4485 Value *Mask = I.getArgOperand(1);
4486 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4487
4488 Value *Src = I.getArgOperand(2);
4489 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4490
4491 const Align Alignment = Align(1);
4492
4493 Value *SrcShadow = getShadow(Src);
4494
4496 insertCheckShadowOf(Dst, &I);
4497 insertCheckShadowOf(Mask, &I);
4498 }
4499
4500 Value *DstShadowPtr;
4501 Value *DstOriginPtr;
4502 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4503 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4504
4505 SmallVector<Value *, 2> ShadowArgs;
4506 ShadowArgs.append(1, DstShadowPtr);
4507 ShadowArgs.append(1, Mask);
4508 // The intrinsic may require floating-point but shadows can be arbitrary
4509 // bit patterns, of which some would be interpreted as "invalid"
4510 // floating-point values (NaN etc.); we assume the intrinsic will happily
4511 // copy them.
4512 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4513
4514 CallInst *CI =
4515 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4516 setShadow(&I, CI);
4517
4518 if (!MS.TrackOrigins)
4519 return;
4520
4521 // Approximation only
4522 auto &DL = F.getDataLayout();
4523 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4524 DL.getTypeStoreSize(SrcShadow->getType()),
4525 std::max(Alignment, kMinOriginAlignment));
4526 }
4527
4528 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4529 // return src mask
4530 //
4531 // Masked-off values are replaced with 0, which conveniently also represents
4532 // initialized memory.
4533 //
4534 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4535 // by handleMaskedStore.
4536 //
4537 // We do not combine this with handleMaskedLoad; see comment in
4538 // handleAVXMaskedStore for the rationale.
4539 //
4540 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4541 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4542 // parameter.
4543 void handleAVXMaskedLoad(IntrinsicInst &I) {
4544 assert(I.arg_size() == 2);
4545
4546 IRBuilder<> IRB(&I);
4547
4548 Value *Src = I.getArgOperand(0);
4549 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4550
4551 Value *Mask = I.getArgOperand(1);
4552 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4553
4554 const Align Alignment = Align(1);
4555
4557 insertCheckShadowOf(Mask, &I);
4558 }
4559
4560 Type *SrcShadowTy = getShadowTy(Src);
4561 Value *SrcShadowPtr, *SrcOriginPtr;
4562 std::tie(SrcShadowPtr, SrcOriginPtr) =
4563 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4564
4565 SmallVector<Value *, 2> ShadowArgs;
4566 ShadowArgs.append(1, SrcShadowPtr);
4567 ShadowArgs.append(1, Mask);
4568
4569 CallInst *CI =
4570 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4571 // The AVX masked load intrinsics do not have integer variants. We use the
4572 // floating-point variants, which will happily copy the shadows even if
4573 // they are interpreted as "invalid" floating-point values (NaN etc.).
4574 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4575
4576 if (!MS.TrackOrigins)
4577 return;
4578
4579 // The "pass-through" value is always zero (initialized). To the extent
4580 // that that results in initialized aligned 4-byte chunks, the origin value
4581 // is ignored. It is therefore correct to simply copy the origin from src.
4582 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4583 setOrigin(&I, PtrSrcOrigin);
4584 }
4585
4586 // Test whether the mask indices are initialized, only checking the bits that
4587 // are actually used.
4588 //
4589 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4590 // used/checked.
4591 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4592 assert(isFixedIntVector(Idx));
4593 auto IdxVectorSize =
4594 cast<FixedVectorType>(Idx->getType())->getNumElements();
4595 assert(isPowerOf2_64(IdxVectorSize));
4596
4597 // Compiler isn't smart enough, let's help it
4598 if (isa<Constant>(Idx))
4599 return;
4600
4601 auto *IdxShadow = getShadow(Idx);
4602 Value *Truncated = IRB.CreateTrunc(
4603 IdxShadow,
4604 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4605 IdxVectorSize));
4606 insertCheckShadow(Truncated, getOrigin(Idx), I);
4607 }
4608
4609 // Instrument AVX permutation intrinsic.
4610 // We apply the same permutation (argument index 1) to the shadow.
4611 void handleAVXVpermilvar(IntrinsicInst &I) {
4612 IRBuilder<> IRB(&I);
4613 Value *Shadow = getShadow(&I, 0);
4614 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4615
4616 // Shadows are integer-ish types but some intrinsics require a
4617 // different (e.g., floating-point) type.
4618 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4619 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4620 {Shadow, I.getArgOperand(1)});
4621
4622 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4623 setOriginForNaryOp(I);
4624 }
4625
4626 // Instrument AVX permutation intrinsic.
4627 // We apply the same permutation (argument index 1) to the shadows.
4628 void handleAVXVpermi2var(IntrinsicInst &I) {
4629 assert(I.arg_size() == 3);
4630 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4631 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4632 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4633 [[maybe_unused]] auto ArgVectorSize =
4634 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4635 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4636 ->getNumElements() == ArgVectorSize);
4637 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4638 ->getNumElements() == ArgVectorSize);
4639 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4640 assert(I.getType() == I.getArgOperand(0)->getType());
4641 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4642 IRBuilder<> IRB(&I);
4643 Value *AShadow = getShadow(&I, 0);
4644 Value *Idx = I.getArgOperand(1);
4645 Value *BShadow = getShadow(&I, 2);
4646
4647 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4648
4649 // Shadows are integer-ish types but some intrinsics require a
4650 // different (e.g., floating-point) type.
4651 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4652 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4653 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4654 {AShadow, Idx, BShadow});
4655 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4656 setOriginForNaryOp(I);
4657 }
4658
4659 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4660 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4661 }
4662
4663 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4664 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4665 }
4666
4667 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4668 return isFixedIntVectorTy(V->getType());
4669 }
4670
4671 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4672 return isFixedFPVectorTy(V->getType());
4673 }
4674
4675 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4676 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4677 // i32 rounding)
4678 //
4679 // Inconveniently, some similar intrinsics have a different operand order:
4680 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4681 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4682 // i16 mask)
4683 //
4684 // If the return type has more elements than A, the excess elements are
4685 // zeroed (and the corresponding shadow is initialized).
4686 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4687 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4688 // i8 mask)
4689 //
4690 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4691 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4692 // where all_or_nothing(x) is fully uninitialized if x has any
4693 // uninitialized bits
4694 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4695 IRBuilder<> IRB(&I);
4696
4697 assert(I.arg_size() == 4);
4698 Value *A = I.getOperand(0);
4699 Value *WriteThrough;
4700 Value *Mask;
4702 if (LastMask) {
4703 WriteThrough = I.getOperand(2);
4704 Mask = I.getOperand(3);
4705 RoundingMode = I.getOperand(1);
4706 } else {
4707 WriteThrough = I.getOperand(1);
4708 Mask = I.getOperand(2);
4709 RoundingMode = I.getOperand(3);
4710 }
4711
4712 assert(isFixedFPVector(A));
4713 assert(isFixedIntVector(WriteThrough));
4714
4715 unsigned ANumElements =
4716 cast<FixedVectorType>(A->getType())->getNumElements();
4717 [[maybe_unused]] unsigned WriteThruNumElements =
4718 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4719 assert(ANumElements == WriteThruNumElements ||
4720 ANumElements * 2 == WriteThruNumElements);
4721
4722 assert(Mask->getType()->isIntegerTy());
4723 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4724 assert(ANumElements == MaskNumElements ||
4725 ANumElements * 2 == MaskNumElements);
4726
4727 assert(WriteThruNumElements == MaskNumElements);
4728
4729 // Some bits of the mask may be unused, though it's unusual to have partly
4730 // uninitialized bits.
4731 insertCheckShadowOf(Mask, &I);
4732
4733 assert(RoundingMode->getType()->isIntegerTy());
4734 // Only some bits of the rounding mode are used, though it's very
4735 // unusual to have uninitialized bits there (more commonly, it's a
4736 // constant).
4737 insertCheckShadowOf(RoundingMode, &I);
4738
4739 assert(I.getType() == WriteThrough->getType());
4740
4741 Value *AShadow = getShadow(A);
4742 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4743
4744 if (ANumElements * 2 == MaskNumElements) {
4745 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4746 // from the zeroed shadow instead of the writethrough's shadow.
4747 Mask =
4748 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4749 Mask =
4750 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4751 }
4752
4753 // Convert i16 mask to <16 x i1>
4754 Mask = IRB.CreateBitCast(
4755 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4756 "_ms_mask_bitcast");
4757
4758 /// For floating-point to integer conversion, the output is:
4759 /// - fully uninitialized if *any* bit of the input is uninitialized
4760 /// - fully ininitialized if all bits of the input are ininitialized
4761 /// We apply the same principle on a per-element basis for vectors.
4762 ///
4763 /// We use the scalar width of the return type instead of A's.
4764 AShadow = IRB.CreateSExt(
4765 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4766 getShadowTy(&I), "_ms_a_shadow");
4767
4768 Value *WriteThroughShadow = getShadow(WriteThrough);
4769 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4770 "_ms_writethru_select");
4771
4772 setShadow(&I, Shadow);
4773 setOriginForNaryOp(I);
4774 }
4775
4776 // Instrument BMI / BMI2 intrinsics.
4777 // All of these intrinsics are Z = I(X, Y)
4778 // where the types of all operands and the result match, and are either i32 or
4779 // i64. The following instrumentation happens to work for all of them:
4780 // Sz = I(Sx, Y) | (sext (Sy != 0))
4781 void handleBmiIntrinsic(IntrinsicInst &I) {
4782 IRBuilder<> IRB(&I);
4783 Type *ShadowTy = getShadowTy(&I);
4784
4785 // If any bit of the mask operand is poisoned, then the whole thing is.
4786 Value *SMask = getShadow(&I, 1);
4787 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4788 ShadowTy);
4789 // Apply the same intrinsic to the shadow of the first operand.
4790 Value *S = IRB.CreateCall(I.getCalledFunction(),
4791 {getShadow(&I, 0), I.getOperand(1)});
4792 S = IRB.CreateOr(SMask, S);
4793 setShadow(&I, S);
4794 setOriginForNaryOp(I);
4795 }
4796
4797 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4798 SmallVector<int, 8> Mask;
4799 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4800 Mask.append(2, X);
4801 }
4802 return Mask;
4803 }
4804
4805 // Instrument pclmul intrinsics.
4806 // These intrinsics operate either on odd or on even elements of the input
4807 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4808 // Replace the unused elements with copies of the used ones, ex:
4809 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4810 // or
4811 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4812 // and then apply the usual shadow combining logic.
4813 void handlePclmulIntrinsic(IntrinsicInst &I) {
4814 IRBuilder<> IRB(&I);
4815 unsigned Width =
4816 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4817 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4818 "pclmul 3rd operand must be a constant");
4819 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4820 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4821 getPclmulMask(Width, Imm & 0x01));
4822 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4823 getPclmulMask(Width, Imm & 0x10));
4824 ShadowAndOriginCombiner SOC(this, IRB);
4825 SOC.Add(Shuf0, getOrigin(&I, 0));
4826 SOC.Add(Shuf1, getOrigin(&I, 1));
4827 SOC.Done(&I);
4828 }
4829
4830 // Instrument _mm_*_sd|ss intrinsics
4831 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4832 IRBuilder<> IRB(&I);
4833 unsigned Width =
4834 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4835 Value *First = getShadow(&I, 0);
4836 Value *Second = getShadow(&I, 1);
4837 // First element of second operand, remaining elements of first operand
4838 SmallVector<int, 16> Mask;
4839 Mask.push_back(Width);
4840 for (unsigned i = 1; i < Width; i++)
4841 Mask.push_back(i);
4842 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4843
4844 setShadow(&I, Shadow);
4845 setOriginForNaryOp(I);
4846 }
4847
4848 void handleVtestIntrinsic(IntrinsicInst &I) {
4849 IRBuilder<> IRB(&I);
4850 Value *Shadow0 = getShadow(&I, 0);
4851 Value *Shadow1 = getShadow(&I, 1);
4852 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4853 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4854 Value *Scalar = convertShadowToScalar(NZ, IRB);
4855 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4856
4857 setShadow(&I, Shadow);
4858 setOriginForNaryOp(I);
4859 }
4860
4861 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4862 IRBuilder<> IRB(&I);
4863 unsigned Width =
4864 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4865 Value *First = getShadow(&I, 0);
4866 Value *Second = getShadow(&I, 1);
4867 Value *OrShadow = IRB.CreateOr(First, Second);
4868 // First element of both OR'd together, remaining elements of first operand
4869 SmallVector<int, 16> Mask;
4870 Mask.push_back(Width);
4871 for (unsigned i = 1; i < Width; i++)
4872 Mask.push_back(i);
4873 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4874
4875 setShadow(&I, Shadow);
4876 setOriginForNaryOp(I);
4877 }
4878
4879 // _mm_round_ps / _mm_round_ps.
4880 // Similar to maybeHandleSimpleNomemIntrinsic except
4881 // the second argument is guaranteed to be a constant integer.
4882 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4883 assert(I.getArgOperand(0)->getType() == I.getType());
4884 assert(I.arg_size() == 2);
4885 assert(isa<ConstantInt>(I.getArgOperand(1)));
4886
4887 IRBuilder<> IRB(&I);
4888 ShadowAndOriginCombiner SC(this, IRB);
4889 SC.Add(I.getArgOperand(0));
4890 SC.Done(&I);
4891 }
4892
4893 // Instrument @llvm.abs intrinsic.
4894 //
4895 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4896 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4897 void handleAbsIntrinsic(IntrinsicInst &I) {
4898 assert(I.arg_size() == 2);
4899 Value *Src = I.getArgOperand(0);
4900 Value *IsIntMinPoison = I.getArgOperand(1);
4901
4902 assert(I.getType()->isIntOrIntVectorTy());
4903
4904 assert(Src->getType() == I.getType());
4905
4906 assert(IsIntMinPoison->getType()->isIntegerTy());
4907 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4908
4909 IRBuilder<> IRB(&I);
4910 Value *SrcShadow = getShadow(Src);
4911
4912 APInt MinVal =
4913 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4914 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4915 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4916
4917 Value *PoisonedShadow = getPoisonedShadow(Src);
4918 Value *PoisonedIfIntMinShadow =
4919 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4920 Value *Shadow =
4921 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4922
4923 setShadow(&I, Shadow);
4924 setOrigin(&I, getOrigin(&I, 0));
4925 }
4926
4927 void handleIsFpClass(IntrinsicInst &I) {
4928 IRBuilder<> IRB(&I);
4929 Value *Shadow = getShadow(&I, 0);
4930 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4931 setOrigin(&I, getOrigin(&I, 0));
4932 }
4933
4934 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4935 IRBuilder<> IRB(&I);
4936 Value *Shadow0 = getShadow(&I, 0);
4937 Value *Shadow1 = getShadow(&I, 1);
4938 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4939 Value *ShadowElt1 =
4940 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4941
4942 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4943 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4944 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4945
4946 setShadow(&I, Shadow);
4947 setOriginForNaryOp(I);
4948 }
4949
4950 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4951 assert(isa<FixedVectorType>(V->getType()));
4952 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4953 Value *Shadow = getShadow(V);
4954 return IRB.CreateExtractElement(Shadow,
4955 ConstantInt::get(IRB.getInt32Ty(), 0));
4956 }
4957
4958 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4959 //
4960 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4961 // (<8 x i64>, <16 x i8>, i8)
4962 // A WriteThru Mask
4963 //
4964 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4965 // (<16 x i32>, <16 x i8>, i16)
4966 //
4967 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4968 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4969 //
4970 // If Dst has more elements than A, the excess elements are zeroed (and the
4971 // corresponding shadow is initialized).
4972 //
4973 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4974 // and is much faster than this handler.
4975 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4976 IRBuilder<> IRB(&I);
4977
4978 assert(I.arg_size() == 3);
4979 Value *A = I.getOperand(0);
4980 Value *WriteThrough = I.getOperand(1);
4981 Value *Mask = I.getOperand(2);
4982
4983 assert(isFixedIntVector(A));
4984 assert(isFixedIntVector(WriteThrough));
4985
4986 unsigned ANumElements =
4987 cast<FixedVectorType>(A->getType())->getNumElements();
4988 unsigned OutputNumElements =
4989 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4990 assert(ANumElements == OutputNumElements ||
4991 ANumElements * 2 == OutputNumElements);
4992
4993 assert(Mask->getType()->isIntegerTy());
4994 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4995 insertCheckShadowOf(Mask, &I);
4996
4997 assert(I.getType() == WriteThrough->getType());
4998
4999 // Widen the mask, if necessary, to have one bit per element of the output
5000 // vector.
5001 // We want the extra bits to have '1's, so that the CreateSelect will
5002 // select the values from AShadow instead of WriteThroughShadow ("maskless"
5003 // versions of the intrinsics are sometimes implemented using an all-1's
5004 // mask and an undefined value for WriteThroughShadow). We accomplish this
5005 // by using bitwise NOT before and after the ZExt.
5006 if (ANumElements != OutputNumElements) {
5007 Mask = IRB.CreateNot(Mask);
5008 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
5009 "_ms_widen_mask");
5010 Mask = IRB.CreateNot(Mask);
5011 }
5012 Mask = IRB.CreateBitCast(
5013 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5014
5015 Value *AShadow = getShadow(A);
5016
5017 // The return type might have more elements than the input.
5018 // Temporarily shrink the return type's number of elements.
5019 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
5020
5021 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
5022 // This handler treats them all as truncation, which leads to some rare
5023 // false positives in the cases where the truncated bytes could
5024 // unambiguously saturate the value e.g., if A = ??????10 ????????
5025 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
5026 // fully defined, but the truncated byte is ????????.
5027 //
5028 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
5029 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
5030 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
5031
5032 Value *WriteThroughShadow = getShadow(WriteThrough);
5033
5034 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
5035 setShadow(&I, Shadow);
5036 setOriginForNaryOp(I);
5037 }
5038
5039 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
5040 // values and perform an operation whose shadow propagation should be handled
5041 // as all-or-nothing [*], with masking provided by a vector and a mask
5042 // supplied as an integer.
5043 //
5044 // [*] if all bits of a vector element are initialized, the output is fully
5045 // initialized; otherwise, the output is fully uninitialized
5046 //
5047 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5048 // (<16 x float>, <16 x float>, i16)
5049 // A WriteThru Mask
5050 //
5051 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5052 // (<2 x double>, <2 x double>, i8)
5053 //
5054 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5055 // (<8 x double>, i32, <8 x double>, i8, i32)
5056 // A Imm WriteThru Mask Rounding
5057 //
5058 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
5059 // be fully initialized.
5060 //
5061 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
5062 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
5063 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
5064 unsigned WriteThruIndex,
5065 unsigned MaskIndex) {
5066 IRBuilder<> IRB(&I);
5067
5068 unsigned NumArgs = I.arg_size();
5069 assert(AIndex < NumArgs);
5070 assert(WriteThruIndex < NumArgs);
5071 assert(MaskIndex < NumArgs);
5072 assert(AIndex != WriteThruIndex);
5073 assert(AIndex != MaskIndex);
5074 assert(WriteThruIndex != MaskIndex);
5075
5076 Value *A = I.getOperand(AIndex);
5077 Value *WriteThru = I.getOperand(WriteThruIndex);
5078 Value *Mask = I.getOperand(MaskIndex);
5079
5080 assert(isFixedFPVector(A));
5081 assert(isFixedFPVector(WriteThru));
5082
5083 [[maybe_unused]] unsigned ANumElements =
5084 cast<FixedVectorType>(A->getType())->getNumElements();
5085 unsigned OutputNumElements =
5086 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5087 assert(ANumElements == OutputNumElements);
5088
5089 for (unsigned i = 0; i < NumArgs; ++i) {
5090 if (i != AIndex && i != WriteThruIndex) {
5091 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5092 // they be fully initialized.
5093 assert(I.getOperand(i)->getType()->isIntegerTy());
5094 insertCheckShadowOf(I.getOperand(i), &I);
5095 }
5096 }
5097
5098 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5099 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
5100 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
5101 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5102
5103 assert(I.getType() == WriteThru->getType());
5104
5105 Mask = IRB.CreateBitCast(
5106 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5107
5108 Value *AShadow = getShadow(A);
5109
5110 // All-or-nothing shadow
5111 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
5112 AShadow->getType());
5113
5114 Value *WriteThruShadow = getShadow(WriteThru);
5115
5116 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
5117 setShadow(&I, Shadow);
5118
5119 setOriginForNaryOp(I);
5120 }
5121
5122 // For sh.* compiler intrinsics:
5123 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5124 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5125 // A B WriteThru Mask RoundingMode
5126 //
5127 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5128 // DstShadow[1..7] = AShadow[1..7]
5129 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5130 IRBuilder<> IRB(&I);
5131
5132 assert(I.arg_size() == 5);
5133 Value *A = I.getOperand(0);
5134 Value *B = I.getOperand(1);
5135 Value *WriteThrough = I.getOperand(2);
5136 Value *Mask = I.getOperand(3);
5137 Value *RoundingMode = I.getOperand(4);
5138
5139 // Technically, we could probably just check whether the LSB is
5140 // initialized, but intuitively it feels like a partly uninitialized mask
5141 // is unintended, and we should warn the user immediately.
5142 insertCheckShadowOf(Mask, &I);
5143 insertCheckShadowOf(RoundingMode, &I);
5144
5145 assert(isa<FixedVectorType>(A->getType()));
5146 unsigned NumElements =
5147 cast<FixedVectorType>(A->getType())->getNumElements();
5148 assert(NumElements == 8);
5149 assert(A->getType() == B->getType());
5150 assert(B->getType() == WriteThrough->getType());
5151 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5152 assert(RoundingMode->getType()->isIntegerTy());
5153
5154 Value *ALowerShadow = extractLowerShadow(IRB, A);
5155 Value *BLowerShadow = extractLowerShadow(IRB, B);
5156
5157 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5158
5159 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5160
5161 Mask = IRB.CreateBitCast(
5162 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5163 Value *MaskLower =
5164 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5165
5166 Value *AShadow = getShadow(A);
5167 Value *DstLowerShadow =
5168 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5169 Value *DstShadow = IRB.CreateInsertElement(
5170 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5171 "_msprop");
5172
5173 setShadow(&I, DstShadow);
5174 setOriginForNaryOp(I);
5175 }
5176
5177 // Approximately handle AVX Galois Field Affine Transformation
5178 //
5179 // e.g.,
5180 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5181 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5182 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5183 // Out A x b
5184 // where A and x are packed matrices, b is a vector,
5185 // Out = A * x + b in GF(2)
5186 //
5187 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5188 // computation also includes a parity calculation.
5189 //
5190 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5191 // Out_Shadow = (V1_Shadow & V2_Shadow)
5192 // | (V1 & V2_Shadow)
5193 // | (V1_Shadow & V2 )
5194 //
5195 // We approximate the shadow of gf2p8affineqb using:
5196 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5197 // | gf2p8affineqb(x, A_shadow, 0)
5198 // | gf2p8affineqb(x_Shadow, A, 0)
5199 // | set1_epi8(b_Shadow)
5200 //
5201 // This approximation has false negatives: if an intermediate dot-product
5202 // contains an even number of 1's, the parity is 0.
5203 // It has no false positives.
5204 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5205 IRBuilder<> IRB(&I);
5206
5207 assert(I.arg_size() == 3);
5208 Value *A = I.getOperand(0);
5209 Value *X = I.getOperand(1);
5210 Value *B = I.getOperand(2);
5211
5212 assert(isFixedIntVector(A));
5213 assert(cast<VectorType>(A->getType())
5214 ->getElementType()
5215 ->getScalarSizeInBits() == 8);
5216
5217 assert(A->getType() == X->getType());
5218
5219 assert(B->getType()->isIntegerTy());
5220 assert(B->getType()->getScalarSizeInBits() == 8);
5221
5222 assert(I.getType() == A->getType());
5223
5224 Value *AShadow = getShadow(A);
5225 Value *XShadow = getShadow(X);
5226 Value *BZeroShadow = getCleanShadow(B);
5227
5228 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5229 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5230 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5231 {X, AShadow, BZeroShadow});
5232 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5233 {XShadow, A, BZeroShadow});
5234
5235 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5236 Value *BShadow = getShadow(B);
5237 Value *BBroadcastShadow = getCleanShadow(AShadow);
5238 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5239 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5240 // lower appropriately (e.g., VPBROADCASTB).
5241 // Besides, b is often a constant, in which case it is fully initialized.
5242 for (unsigned i = 0; i < NumElements; i++)
5243 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5244
5245 setShadow(&I, IRB.CreateOr(
5246 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5247 setOriginForNaryOp(I);
5248 }
5249
5250 // Handle Arm NEON vector load intrinsics (vld*).
5251 //
5252 // The WithLane instructions (ld[234]lane) are similar to:
5253 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5254 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5255 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5256 // %A)
5257 //
5258 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5259 // to:
5260 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5261 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5262 unsigned int numArgs = I.arg_size();
5263
5264 // Return type is a struct of vectors of integers or floating-point
5265 assert(I.getType()->isStructTy());
5266 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5267 assert(RetTy->getNumElements() > 0);
5269 RetTy->getElementType(0)->isFPOrFPVectorTy());
5270 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5271 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5272
5273 if (WithLane) {
5274 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5275 assert(4 <= numArgs && numArgs <= 6);
5276
5277 // Return type is a struct of the input vectors
5278 assert(RetTy->getNumElements() + 2 == numArgs);
5279 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5280 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5281 } else {
5282 assert(numArgs == 1);
5283 }
5284
5285 IRBuilder<> IRB(&I);
5286
5287 SmallVector<Value *, 6> ShadowArgs;
5288 if (WithLane) {
5289 for (unsigned int i = 0; i < numArgs - 2; i++)
5290 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5291
5292 // Lane number, passed verbatim
5293 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5294 ShadowArgs.push_back(LaneNumber);
5295
5296 // TODO: blend shadow of lane number into output shadow?
5297 insertCheckShadowOf(LaneNumber, &I);
5298 }
5299
5300 Value *Src = I.getArgOperand(numArgs - 1);
5301 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5302
5303 Type *SrcShadowTy = getShadowTy(Src);
5304 auto [SrcShadowPtr, SrcOriginPtr] =
5305 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5306 ShadowArgs.push_back(SrcShadowPtr);
5307
5308 // The NEON vector load instructions handled by this function all have
5309 // integer variants. It is easier to use those rather than trying to cast
5310 // a struct of vectors of floats into a struct of vectors of integers.
5311 CallInst *CI =
5312 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5313 setShadow(&I, CI);
5314
5315 if (!MS.TrackOrigins)
5316 return;
5317
5318 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5319 setOrigin(&I, PtrSrcOrigin);
5320 }
5321
5322 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5323 /// and vst{2,3,4}lane).
5324 ///
5325 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5326 /// last argument, with the initial arguments being the inputs (and lane
5327 /// number for vst{2,3,4}lane). They return void.
5328 ///
5329 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5330 /// abcdabcdabcdabcd... into *outP
5331 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5332 /// writes aaaa...bbbb...cccc...dddd... into *outP
5333 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5334 /// These instructions can all be instrumented with essentially the same
5335 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5336 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5337 IRBuilder<> IRB(&I);
5338
5339 // Don't use getNumOperands() because it includes the callee
5340 int numArgOperands = I.arg_size();
5341
5342 // The last arg operand is the output (pointer)
5343 assert(numArgOperands >= 1);
5344 Value *Addr = I.getArgOperand(numArgOperands - 1);
5345 assert(Addr->getType()->isPointerTy());
5346 int skipTrailingOperands = 1;
5347
5349 insertCheckShadowOf(Addr, &I);
5350
5351 // Second-last operand is the lane number (for vst{2,3,4}lane)
5352 if (useLane) {
5353 skipTrailingOperands++;
5354 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5356 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5357 }
5358
5359 SmallVector<Value *, 8> ShadowArgs;
5360 // All the initial operands are the inputs
5361 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5362 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5363 Value *Shadow = getShadow(&I, i);
5364 ShadowArgs.append(1, Shadow);
5365 }
5366
5367 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5368 // e.g., for:
5369 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5370 // we know the type of the output (and its shadow) is <16 x i8>.
5371 //
5372 // Arm NEON VST is unusual because the last argument is the output address:
5373 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5374 // call void @llvm.aarch64.neon.st2.v16i8.p0
5375 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5376 // and we have no type information about P's operand. We must manually
5377 // compute the type (<16 x i8> x 2).
5378 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5379 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5380 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5381 (numArgOperands - skipTrailingOperands));
5382 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5383
5384 if (useLane)
5385 ShadowArgs.append(1,
5386 I.getArgOperand(numArgOperands - skipTrailingOperands));
5387
5388 Value *OutputShadowPtr, *OutputOriginPtr;
5389 // AArch64 NEON does not need alignment (unless OS requires it)
5390 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5391 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5392 ShadowArgs.append(1, OutputShadowPtr);
5393
5394 CallInst *CI =
5395 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5396 setShadow(&I, CI);
5397
5398 if (MS.TrackOrigins) {
5399 // TODO: if we modelled the vst* instruction more precisely, we could
5400 // more accurately track the origins (e.g., if both inputs are
5401 // uninitialized for vst2, we currently blame the second input, even
5402 // though part of the output depends only on the first input).
5403 //
5404 // This is particularly imprecise for vst{2,3,4}lane, since only one
5405 // lane of each input is actually copied to the output.
5406 OriginCombiner OC(this, IRB);
5407 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5408 OC.Add(I.getArgOperand(i));
5409
5410 const DataLayout &DL = F.getDataLayout();
5411 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5412 OutputOriginPtr);
5413 }
5414 }
5415
5416 // <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8
5417 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5418 // <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5419 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5420 // <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8
5421 // (<4 x i32> R%, <16 x i8> %X, <16 x i8> %Y)
5422 //
5423 // Note:
5424 // - < 4 x *> is a 2x2 matrix
5425 // - <16 x *> is a 2x8 matrix and 8x2 matrix respectively
5426 //
5427 // The general shadow propagation approach is:
5428 // 1) get the shadows of the input matrices %X and %Y
5429 // 2) change the shadow values to 0x1 if the corresponding value is fully
5430 // initialized, and 0x0 otherwise
5431 // 3) perform a matrix multiplication on the shadows of %X and %Y. The output
5432 // will be a 2x2 matrix; for each element, a value of 0x8 means all the
5433 // corresponding inputs were clean.
5434 // 4) blend in the shadow of %R
5435 //
5436 // TODO: consider allowing multiplication of zero with an uninitialized value
5437 // to result in an initialized value.
5438 //
5439 // TODO: handle floating-point matrix multiply using ummla on the shadows:
5440 // case Intrinsic::aarch64_neon_bfmmla:
5441 // handleNEONMatrixMultiply(I, /*ARows=*/ 2, /*ACols=*/ 4,
5442 // /*BRows=*/ 4, /*BCols=*/ 2);
5443 //
5444 void handleNEONMatrixMultiply(IntrinsicInst &I, unsigned int ARows,
5445 unsigned int ACols, unsigned int BRows,
5446 unsigned int BCols) {
5447 IRBuilder<> IRB(&I);
5448
5449 assert(I.arg_size() == 3);
5450 Value *R = I.getArgOperand(0);
5451 Value *A = I.getArgOperand(1);
5452 Value *B = I.getArgOperand(2);
5453
5454 assert(I.getType() == R->getType());
5455
5456 assert(isa<FixedVectorType>(R->getType()));
5457 assert(isa<FixedVectorType>(A->getType()));
5458 assert(isa<FixedVectorType>(B->getType()));
5459
5460 [[maybe_unused]] FixedVectorType *RTy = cast<FixedVectorType>(R->getType());
5461 [[maybe_unused]] FixedVectorType *ATy = cast<FixedVectorType>(A->getType());
5462 [[maybe_unused]] FixedVectorType *BTy = cast<FixedVectorType>(B->getType());
5463
5464 assert(ACols == BRows);
5465 assert(ATy->getNumElements() == ARows * ACols);
5466 assert(BTy->getNumElements() == BRows * BCols);
5467 assert(RTy->getNumElements() == ARows * BCols);
5468
5469 LLVM_DEBUG(dbgs() << "### R: " << *RTy->getElementType() << "\n");
5470 LLVM_DEBUG(dbgs() << "### A: " << *ATy->getElementType() << "\n");
5471 if (RTy->getElementType()->isIntegerTy()) {
5472 // Types are not identical e.g., <4 x i32> %R, <16 x i8> %A
5474 } else {
5477 }
5478 assert(ATy->getElementType() == BTy->getElementType());
5479
5480 Value *ShadowR = getShadow(&I, 0);
5481 Value *ShadowA = getShadow(&I, 1);
5482 Value *ShadowB = getShadow(&I, 2);
5483
5484 // If the value is fully initialized, the shadow will be 000...001.
5485 // Otherwise, the shadow will be all zero.
5486 // (This is the opposite of how we typically handle shadows.)
5487 ShadowA = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowA, getCleanShadow(A)),
5488 ShadowA->getType());
5489 ShadowB = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowB, getCleanShadow(B)),
5490 ShadowB->getType());
5491
5492 Value *ShadowAB = IRB.CreateIntrinsic(
5493 I.getType(), I.getIntrinsicID(), {getCleanShadow(R), ShadowA, ShadowB});
5494
5495 Value *FullyInit = ConstantVector::getSplat(
5496 RTy->getElementCount(),
5497 ConstantInt::get(cast<VectorType>(getShadowTy(R))->getElementType(),
5498 ACols));
5499
5500 ShadowAB = IRB.CreateSExt(IRB.CreateICmpNE(ShadowAB, FullyInit),
5501 ShadowAB->getType());
5502
5503 ShadowR = IRB.CreateSExt(IRB.CreateICmpNE(ShadowR, getCleanShadow(R)),
5504 ShadowR->getType());
5505
5506 setShadow(&I, IRB.CreateOr(ShadowAB, ShadowR));
5507 setOriginForNaryOp(I);
5508 }
5509
5510 /// Handle intrinsics by applying the intrinsic to the shadows.
5511 ///
5512 /// The trailing arguments are passed verbatim to the intrinsic, though any
5513 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5514 /// intrinsic with one trailing verbatim argument:
5515 /// out = intrinsic(var1, var2, opType)
5516 /// we compute:
5517 /// shadow[out] =
5518 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5519 ///
5520 /// Typically, shadowIntrinsicID will be specified by the caller to be
5521 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5522 /// intrinsic of the same type.
5523 ///
5524 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5525 /// bit-patterns (for example, if the intrinsic accepts floats for
5526 /// var1, we require that it doesn't care if inputs are NaNs).
5527 ///
5528 /// For example, this can be applied to the Arm NEON vector table intrinsics
5529 /// (tbl{1,2,3,4}).
5530 ///
5531 /// The origin is approximated using setOriginForNaryOp.
5532 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5533 Intrinsic::ID shadowIntrinsicID,
5534 unsigned int trailingVerbatimArgs) {
5535 IRBuilder<> IRB(&I);
5536
5537 assert(trailingVerbatimArgs < I.arg_size());
5538
5539 SmallVector<Value *, 8> ShadowArgs;
5540 // Don't use getNumOperands() because it includes the callee
5541 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5542 Value *Shadow = getShadow(&I, i);
5543
5544 // Shadows are integer-ish types but some intrinsics require a
5545 // different (e.g., floating-point) type.
5546 ShadowArgs.push_back(
5547 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5548 }
5549
5550 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5551 i++) {
5552 Value *Arg = I.getArgOperand(i);
5553 ShadowArgs.push_back(Arg);
5554 }
5555
5556 CallInst *CI =
5557 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5558 Value *CombinedShadow = CI;
5559
5560 // Combine the computed shadow with the shadow of trailing args
5561 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5562 i++) {
5563 Value *Shadow =
5564 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5565 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5566 }
5567
5568 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5569
5570 setOriginForNaryOp(I);
5571 }
5572
5573 // Approximation only
5574 //
5575 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5576 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5577 assert(I.arg_size() == 2);
5578
5579 handleShadowOr(I);
5580 }
5581
5582 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5583 switch (I.getIntrinsicID()) {
5584 case Intrinsic::uadd_with_overflow:
5585 case Intrinsic::sadd_with_overflow:
5586 case Intrinsic::usub_with_overflow:
5587 case Intrinsic::ssub_with_overflow:
5588 case Intrinsic::umul_with_overflow:
5589 case Intrinsic::smul_with_overflow:
5590 handleArithmeticWithOverflow(I);
5591 break;
5592 case Intrinsic::abs:
5593 handleAbsIntrinsic(I);
5594 break;
5595 case Intrinsic::bitreverse:
5596 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5597 /*trailingVerbatimArgs*/ 0);
5598 break;
5599 case Intrinsic::is_fpclass:
5600 handleIsFpClass(I);
5601 break;
5602 case Intrinsic::lifetime_start:
5603 handleLifetimeStart(I);
5604 break;
5605 case Intrinsic::launder_invariant_group:
5606 case Intrinsic::strip_invariant_group:
5607 handleInvariantGroup(I);
5608 break;
5609 case Intrinsic::bswap:
5610 handleBswap(I);
5611 break;
5612 case Intrinsic::ctlz:
5613 case Intrinsic::cttz:
5614 handleCountLeadingTrailingZeros(I);
5615 break;
5616 case Intrinsic::masked_compressstore:
5617 handleMaskedCompressStore(I);
5618 break;
5619 case Intrinsic::masked_expandload:
5620 handleMaskedExpandLoad(I);
5621 break;
5622 case Intrinsic::masked_gather:
5623 handleMaskedGather(I);
5624 break;
5625 case Intrinsic::masked_scatter:
5626 handleMaskedScatter(I);
5627 break;
5628 case Intrinsic::masked_store:
5629 handleMaskedStore(I);
5630 break;
5631 case Intrinsic::masked_load:
5632 handleMaskedLoad(I);
5633 break;
5634 case Intrinsic::vector_reduce_and:
5635 handleVectorReduceAndIntrinsic(I);
5636 break;
5637 case Intrinsic::vector_reduce_or:
5638 handleVectorReduceOrIntrinsic(I);
5639 break;
5640
5641 case Intrinsic::vector_reduce_add:
5642 case Intrinsic::vector_reduce_xor:
5643 case Intrinsic::vector_reduce_mul:
5644 // Signed/Unsigned Min/Max
5645 // TODO: handling similarly to AND/OR may be more precise.
5646 case Intrinsic::vector_reduce_smax:
5647 case Intrinsic::vector_reduce_smin:
5648 case Intrinsic::vector_reduce_umax:
5649 case Intrinsic::vector_reduce_umin:
5650 // TODO: this has no false positives, but arguably we should check that all
5651 // the bits are initialized.
5652 case Intrinsic::vector_reduce_fmax:
5653 case Intrinsic::vector_reduce_fmin:
5654 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5655 break;
5656
5657 case Intrinsic::vector_reduce_fadd:
5658 case Intrinsic::vector_reduce_fmul:
5659 handleVectorReduceWithStarterIntrinsic(I);
5660 break;
5661
5662 case Intrinsic::scmp:
5663 case Intrinsic::ucmp: {
5664 handleShadowOr(I);
5665 break;
5666 }
5667
5668 case Intrinsic::fshl:
5669 case Intrinsic::fshr:
5670 handleFunnelShift(I);
5671 break;
5672
5673 case Intrinsic::is_constant:
5674 // The result of llvm.is.constant() is always defined.
5675 setShadow(&I, getCleanShadow(&I));
5676 setOrigin(&I, getCleanOrigin());
5677 break;
5678
5679 default:
5680 return false;
5681 }
5682
5683 return true;
5684 }
5685
5686 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5687 switch (I.getIntrinsicID()) {
5688 case Intrinsic::x86_sse_stmxcsr:
5689 handleStmxcsr(I);
5690 break;
5691 case Intrinsic::x86_sse_ldmxcsr:
5692 handleLdmxcsr(I);
5693 break;
5694
5695 // Convert Scalar Double Precision Floating-Point Value
5696 // to Unsigned Doubleword Integer
5697 // etc.
5698 case Intrinsic::x86_avx512_vcvtsd2usi64:
5699 case Intrinsic::x86_avx512_vcvtsd2usi32:
5700 case Intrinsic::x86_avx512_vcvtss2usi64:
5701 case Intrinsic::x86_avx512_vcvtss2usi32:
5702 case Intrinsic::x86_avx512_cvttss2usi64:
5703 case Intrinsic::x86_avx512_cvttss2usi:
5704 case Intrinsic::x86_avx512_cvttsd2usi64:
5705 case Intrinsic::x86_avx512_cvttsd2usi:
5706 case Intrinsic::x86_avx512_cvtusi2ss:
5707 case Intrinsic::x86_avx512_cvtusi642sd:
5708 case Intrinsic::x86_avx512_cvtusi642ss:
5709 handleSSEVectorConvertIntrinsic(I, 1, true);
5710 break;
5711 case Intrinsic::x86_sse2_cvtsd2si64:
5712 case Intrinsic::x86_sse2_cvtsd2si:
5713 case Intrinsic::x86_sse2_cvtsd2ss:
5714 case Intrinsic::x86_sse2_cvttsd2si64:
5715 case Intrinsic::x86_sse2_cvttsd2si:
5716 case Intrinsic::x86_sse_cvtss2si64:
5717 case Intrinsic::x86_sse_cvtss2si:
5718 case Intrinsic::x86_sse_cvttss2si64:
5719 case Intrinsic::x86_sse_cvttss2si:
5720 handleSSEVectorConvertIntrinsic(I, 1);
5721 break;
5722 case Intrinsic::x86_sse_cvtps2pi:
5723 case Intrinsic::x86_sse_cvttps2pi:
5724 handleSSEVectorConvertIntrinsic(I, 2);
5725 break;
5726
5727 // TODO:
5728 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5729 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5730 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5731
5732 case Intrinsic::x86_vcvtps2ph_128:
5733 case Intrinsic::x86_vcvtps2ph_256: {
5734 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5735 break;
5736 }
5737
5738 // Convert Packed Single Precision Floating-Point Values
5739 // to Packed Signed Doubleword Integer Values
5740 //
5741 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5742 // (<16 x float>, <16 x i32>, i16, i32)
5743 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5744 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5745 break;
5746
5747 // Convert Packed Double Precision Floating-Point Values
5748 // to Packed Single Precision Floating-Point Values
5749 case Intrinsic::x86_sse2_cvtpd2ps:
5750 case Intrinsic::x86_sse2_cvtps2dq:
5751 case Intrinsic::x86_sse2_cvtpd2dq:
5752 case Intrinsic::x86_sse2_cvttps2dq:
5753 case Intrinsic::x86_sse2_cvttpd2dq:
5754 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5755 case Intrinsic::x86_avx_cvt_ps2dq_256:
5756 case Intrinsic::x86_avx_cvt_pd2dq_256:
5757 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5758 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5759 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5760 break;
5761 }
5762
5763 // Convert Single-Precision FP Value to 16-bit FP Value
5764 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5765 // (<16 x float>, i32, <16 x i16>, i16)
5766 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5767 // (<4 x float>, i32, <8 x i16>, i8)
5768 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5769 // (<8 x float>, i32, <8 x i16>, i8)
5770 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5771 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5772 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5773 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5774 break;
5775
5776 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5777 case Intrinsic::x86_avx512_psll_w_512:
5778 case Intrinsic::x86_avx512_psll_d_512:
5779 case Intrinsic::x86_avx512_psll_q_512:
5780 case Intrinsic::x86_avx512_pslli_w_512:
5781 case Intrinsic::x86_avx512_pslli_d_512:
5782 case Intrinsic::x86_avx512_pslli_q_512:
5783 case Intrinsic::x86_avx512_psrl_w_512:
5784 case Intrinsic::x86_avx512_psrl_d_512:
5785 case Intrinsic::x86_avx512_psrl_q_512:
5786 case Intrinsic::x86_avx512_psra_w_512:
5787 case Intrinsic::x86_avx512_psra_d_512:
5788 case Intrinsic::x86_avx512_psra_q_512:
5789 case Intrinsic::x86_avx512_psrli_w_512:
5790 case Intrinsic::x86_avx512_psrli_d_512:
5791 case Intrinsic::x86_avx512_psrli_q_512:
5792 case Intrinsic::x86_avx512_psrai_w_512:
5793 case Intrinsic::x86_avx512_psrai_d_512:
5794 case Intrinsic::x86_avx512_psrai_q_512:
5795 case Intrinsic::x86_avx512_psra_q_256:
5796 case Intrinsic::x86_avx512_psra_q_128:
5797 case Intrinsic::x86_avx512_psrai_q_256:
5798 case Intrinsic::x86_avx512_psrai_q_128:
5799 case Intrinsic::x86_avx2_psll_w:
5800 case Intrinsic::x86_avx2_psll_d:
5801 case Intrinsic::x86_avx2_psll_q:
5802 case Intrinsic::x86_avx2_pslli_w:
5803 case Intrinsic::x86_avx2_pslli_d:
5804 case Intrinsic::x86_avx2_pslli_q:
5805 case Intrinsic::x86_avx2_psrl_w:
5806 case Intrinsic::x86_avx2_psrl_d:
5807 case Intrinsic::x86_avx2_psrl_q:
5808 case Intrinsic::x86_avx2_psra_w:
5809 case Intrinsic::x86_avx2_psra_d:
5810 case Intrinsic::x86_avx2_psrli_w:
5811 case Intrinsic::x86_avx2_psrli_d:
5812 case Intrinsic::x86_avx2_psrli_q:
5813 case Intrinsic::x86_avx2_psrai_w:
5814 case Intrinsic::x86_avx2_psrai_d:
5815 case Intrinsic::x86_sse2_psll_w:
5816 case Intrinsic::x86_sse2_psll_d:
5817 case Intrinsic::x86_sse2_psll_q:
5818 case Intrinsic::x86_sse2_pslli_w:
5819 case Intrinsic::x86_sse2_pslli_d:
5820 case Intrinsic::x86_sse2_pslli_q:
5821 case Intrinsic::x86_sse2_psrl_w:
5822 case Intrinsic::x86_sse2_psrl_d:
5823 case Intrinsic::x86_sse2_psrl_q:
5824 case Intrinsic::x86_sse2_psra_w:
5825 case Intrinsic::x86_sse2_psra_d:
5826 case Intrinsic::x86_sse2_psrli_w:
5827 case Intrinsic::x86_sse2_psrli_d:
5828 case Intrinsic::x86_sse2_psrli_q:
5829 case Intrinsic::x86_sse2_psrai_w:
5830 case Intrinsic::x86_sse2_psrai_d:
5831 case Intrinsic::x86_mmx_psll_w:
5832 case Intrinsic::x86_mmx_psll_d:
5833 case Intrinsic::x86_mmx_psll_q:
5834 case Intrinsic::x86_mmx_pslli_w:
5835 case Intrinsic::x86_mmx_pslli_d:
5836 case Intrinsic::x86_mmx_pslli_q:
5837 case Intrinsic::x86_mmx_psrl_w:
5838 case Intrinsic::x86_mmx_psrl_d:
5839 case Intrinsic::x86_mmx_psrl_q:
5840 case Intrinsic::x86_mmx_psra_w:
5841 case Intrinsic::x86_mmx_psra_d:
5842 case Intrinsic::x86_mmx_psrli_w:
5843 case Intrinsic::x86_mmx_psrli_d:
5844 case Intrinsic::x86_mmx_psrli_q:
5845 case Intrinsic::x86_mmx_psrai_w:
5846 case Intrinsic::x86_mmx_psrai_d:
5847 handleVectorShiftIntrinsic(I, /* Variable */ false);
5848 break;
5849 case Intrinsic::x86_avx2_psllv_d:
5850 case Intrinsic::x86_avx2_psllv_d_256:
5851 case Intrinsic::x86_avx512_psllv_d_512:
5852 case Intrinsic::x86_avx2_psllv_q:
5853 case Intrinsic::x86_avx2_psllv_q_256:
5854 case Intrinsic::x86_avx512_psllv_q_512:
5855 case Intrinsic::x86_avx2_psrlv_d:
5856 case Intrinsic::x86_avx2_psrlv_d_256:
5857 case Intrinsic::x86_avx512_psrlv_d_512:
5858 case Intrinsic::x86_avx2_psrlv_q:
5859 case Intrinsic::x86_avx2_psrlv_q_256:
5860 case Intrinsic::x86_avx512_psrlv_q_512:
5861 case Intrinsic::x86_avx2_psrav_d:
5862 case Intrinsic::x86_avx2_psrav_d_256:
5863 case Intrinsic::x86_avx512_psrav_d_512:
5864 case Intrinsic::x86_avx512_psrav_q_128:
5865 case Intrinsic::x86_avx512_psrav_q_256:
5866 case Intrinsic::x86_avx512_psrav_q_512:
5867 handleVectorShiftIntrinsic(I, /* Variable */ true);
5868 break;
5869
5870 // Pack with Signed/Unsigned Saturation
5871 case Intrinsic::x86_sse2_packsswb_128:
5872 case Intrinsic::x86_sse2_packssdw_128:
5873 case Intrinsic::x86_sse2_packuswb_128:
5874 case Intrinsic::x86_sse41_packusdw:
5875 case Intrinsic::x86_avx2_packsswb:
5876 case Intrinsic::x86_avx2_packssdw:
5877 case Intrinsic::x86_avx2_packuswb:
5878 case Intrinsic::x86_avx2_packusdw:
5879 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5880 // (<32 x i16> %a, <32 x i16> %b)
5881 // <32 x i16> @llvm.x86.avx512.packssdw.512
5882 // (<16 x i32> %a, <16 x i32> %b)
5883 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5884 case Intrinsic::x86_avx512_packsswb_512:
5885 case Intrinsic::x86_avx512_packssdw_512:
5886 case Intrinsic::x86_avx512_packuswb_512:
5887 case Intrinsic::x86_avx512_packusdw_512:
5888 handleVectorPackIntrinsic(I);
5889 break;
5890
5891 case Intrinsic::x86_sse41_pblendvb:
5892 case Intrinsic::x86_sse41_blendvpd:
5893 case Intrinsic::x86_sse41_blendvps:
5894 case Intrinsic::x86_avx_blendv_pd_256:
5895 case Intrinsic::x86_avx_blendv_ps_256:
5896 case Intrinsic::x86_avx2_pblendvb:
5897 handleBlendvIntrinsic(I);
5898 break;
5899
5900 case Intrinsic::x86_avx_dp_ps_256:
5901 case Intrinsic::x86_sse41_dppd:
5902 case Intrinsic::x86_sse41_dpps:
5903 handleDppIntrinsic(I);
5904 break;
5905
5906 case Intrinsic::x86_mmx_packsswb:
5907 case Intrinsic::x86_mmx_packuswb:
5908 handleVectorPackIntrinsic(I, 16);
5909 break;
5910
5911 case Intrinsic::x86_mmx_packssdw:
5912 handleVectorPackIntrinsic(I, 32);
5913 break;
5914
5915 case Intrinsic::x86_mmx_psad_bw:
5916 handleVectorSadIntrinsic(I, true);
5917 break;
5918 case Intrinsic::x86_sse2_psad_bw:
5919 case Intrinsic::x86_avx2_psad_bw:
5920 handleVectorSadIntrinsic(I);
5921 break;
5922
5923 // Multiply and Add Packed Words
5924 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5925 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5926 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5927 //
5928 // Multiply and Add Packed Signed and Unsigned Bytes
5929 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5930 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5931 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5932 //
5933 // These intrinsics are auto-upgraded into non-masked forms:
5934 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5935 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5936 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5937 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5938 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5939 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5940 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5941 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5942 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5943 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5944 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5945 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5946 case Intrinsic::x86_sse2_pmadd_wd:
5947 case Intrinsic::x86_avx2_pmadd_wd:
5948 case Intrinsic::x86_avx512_pmaddw_d_512:
5949 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5950 case Intrinsic::x86_avx2_pmadd_ub_sw:
5951 case Intrinsic::x86_avx512_pmaddubs_w_512:
5952 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5953 /*ZeroPurifies=*/true,
5954 /*EltSizeInBits=*/0,
5955 /*Lanes=*/kBothLanes);
5956 break;
5957
5958 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5959 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5960 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5961 /*ZeroPurifies=*/true,
5962 /*EltSizeInBits=*/8,
5963 /*Lanes=*/kBothLanes);
5964 break;
5965
5966 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5967 case Intrinsic::x86_mmx_pmadd_wd:
5968 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5969 /*ZeroPurifies=*/true,
5970 /*EltSizeInBits=*/16,
5971 /*Lanes=*/kBothLanes);
5972 break;
5973
5974 // BFloat16 multiply-add to single-precision
5975 // <4 x float> llvm.aarch64.neon.bfmlalt
5976 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5977 case Intrinsic::aarch64_neon_bfmlalt:
5978 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5979 /*ZeroPurifies=*/false,
5980 /*EltSizeInBits=*/0,
5981 /*Lanes=*/kOddLanes);
5982 break;
5983
5984 // <4 x float> llvm.aarch64.neon.bfmlalb
5985 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5986 case Intrinsic::aarch64_neon_bfmlalb:
5987 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5988 /*ZeroPurifies=*/false,
5989 /*EltSizeInBits=*/0,
5990 /*Lanes=*/kEvenLanes);
5991 break;
5992
5993 // AVX Vector Neural Network Instructions: bytes
5994 //
5995 // Multiply and Add Signed Bytes
5996 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5997 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5998 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5999 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6000 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
6001 // (<16 x i32>, <64 x i8>, <64 x i8>)
6002 //
6003 // Multiply and Add Signed Bytes With Saturation
6004 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
6005 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6006 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
6007 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6008 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
6009 // (<16 x i32>, <64 x i8>, <64 x i8>)
6010 //
6011 // Multiply and Add Signed and Unsigned Bytes
6012 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
6013 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6014 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
6015 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6016 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
6017 // (<16 x i32>, <64 x i8>, <64 x i8>)
6018 //
6019 // Multiply and Add Signed and Unsigned Bytes With Saturation
6020 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
6021 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6022 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
6023 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6024 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
6025 // (<16 x i32>, <64 x i8>, <64 x i8>)
6026 //
6027 // Multiply and Add Unsigned and Signed Bytes
6028 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
6029 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6030 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
6031 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6032 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
6033 // (<16 x i32>, <64 x i8>, <64 x i8>)
6034 //
6035 // Multiply and Add Unsigned and Signed Bytes With Saturation
6036 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
6037 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6038 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
6039 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6040 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
6041 // (<16 x i32>, <64 x i8>, <64 x i8>)
6042 //
6043 // Multiply and Add Unsigned Bytes
6044 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
6045 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6046 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
6047 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6048 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
6049 // (<16 x i32>, <64 x i8>, <64 x i8>)
6050 //
6051 // Multiply and Add Unsigned Bytes With Saturation
6052 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
6053 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6054 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
6055 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6056 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
6057 // (<16 x i32>, <64 x i8>, <64 x i8>)
6058 //
6059 // These intrinsics are auto-upgraded into non-masked forms:
6060 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
6061 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6062 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
6063 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6064 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
6065 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6066 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6067 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6068 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6069 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6070 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6071 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6072 //
6073 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6074 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6075 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6076 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6077 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6078 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6079 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6080 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6081 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6082 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6083 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6084 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6085 case Intrinsic::x86_avx512_vpdpbusd_128:
6086 case Intrinsic::x86_avx512_vpdpbusd_256:
6087 case Intrinsic::x86_avx512_vpdpbusd_512:
6088 case Intrinsic::x86_avx512_vpdpbusds_128:
6089 case Intrinsic::x86_avx512_vpdpbusds_256:
6090 case Intrinsic::x86_avx512_vpdpbusds_512:
6091 case Intrinsic::x86_avx2_vpdpbssd_128:
6092 case Intrinsic::x86_avx2_vpdpbssd_256:
6093 case Intrinsic::x86_avx10_vpdpbssd_512:
6094 case Intrinsic::x86_avx2_vpdpbssds_128:
6095 case Intrinsic::x86_avx2_vpdpbssds_256:
6096 case Intrinsic::x86_avx10_vpdpbssds_512:
6097 case Intrinsic::x86_avx2_vpdpbsud_128:
6098 case Intrinsic::x86_avx2_vpdpbsud_256:
6099 case Intrinsic::x86_avx10_vpdpbsud_512:
6100 case Intrinsic::x86_avx2_vpdpbsuds_128:
6101 case Intrinsic::x86_avx2_vpdpbsuds_256:
6102 case Intrinsic::x86_avx10_vpdpbsuds_512:
6103 case Intrinsic::x86_avx2_vpdpbuud_128:
6104 case Intrinsic::x86_avx2_vpdpbuud_256:
6105 case Intrinsic::x86_avx10_vpdpbuud_512:
6106 case Intrinsic::x86_avx2_vpdpbuuds_128:
6107 case Intrinsic::x86_avx2_vpdpbuuds_256:
6108 case Intrinsic::x86_avx10_vpdpbuuds_512:
6109 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6110 /*ZeroPurifies=*/true,
6111 /*EltSizeInBits=*/0,
6112 /*Lanes=*/kBothLanes);
6113 break;
6114
6115 // AVX Vector Neural Network Instructions: words
6116 //
6117 // Multiply and Add Signed Word Integers
6118 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6119 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6120 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6121 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6122 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6123 // (<16 x i32>, <32 x i16>, <32 x i16>)
6124 //
6125 // Multiply and Add Signed Word Integers With Saturation
6126 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6127 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6128 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6129 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6130 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6131 // (<16 x i32>, <32 x i16>, <32 x i16>)
6132 //
6133 // Multiply and Add Signed and Unsigned Word Integers
6134 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6135 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6136 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6137 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6138 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6139 // (<16 x i32>, <32 x i16>, <32 x i16>)
6140 //
6141 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6142 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6143 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6144 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6145 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6146 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6147 // (<16 x i32>, <32 x i16>, <32 x i16>)
6148 //
6149 // Multiply and Add Unsigned and Signed Word Integers
6150 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6151 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6152 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6153 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6154 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6155 // (<16 x i32>, <32 x i16>, <32 x i16>)
6156 //
6157 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6158 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6159 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6160 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6161 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6162 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6163 // (<16 x i32>, <32 x i16>, <32 x i16>)
6164 //
6165 // Multiply and Add Unsigned and Unsigned Word Integers
6166 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6167 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6168 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6169 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6170 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6171 // (<16 x i32>, <32 x i16>, <32 x i16>)
6172 //
6173 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6174 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6175 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6176 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6177 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6178 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6179 // (<16 x i32>, <32 x i16>, <32 x i16>)
6180 //
6181 // These intrinsics are auto-upgraded into non-masked forms:
6182 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6183 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6184 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6185 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6186 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6187 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6188 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6189 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6190 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6191 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6192 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6193 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6194 //
6195 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6196 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6197 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6198 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6199 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6200 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6201 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6202 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6203 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6204 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6205 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6206 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6207 case Intrinsic::x86_avx512_vpdpwssd_128:
6208 case Intrinsic::x86_avx512_vpdpwssd_256:
6209 case Intrinsic::x86_avx512_vpdpwssd_512:
6210 case Intrinsic::x86_avx512_vpdpwssds_128:
6211 case Intrinsic::x86_avx512_vpdpwssds_256:
6212 case Intrinsic::x86_avx512_vpdpwssds_512:
6213 case Intrinsic::x86_avx2_vpdpwsud_128:
6214 case Intrinsic::x86_avx2_vpdpwsud_256:
6215 case Intrinsic::x86_avx10_vpdpwsud_512:
6216 case Intrinsic::x86_avx2_vpdpwsuds_128:
6217 case Intrinsic::x86_avx2_vpdpwsuds_256:
6218 case Intrinsic::x86_avx10_vpdpwsuds_512:
6219 case Intrinsic::x86_avx2_vpdpwusd_128:
6220 case Intrinsic::x86_avx2_vpdpwusd_256:
6221 case Intrinsic::x86_avx10_vpdpwusd_512:
6222 case Intrinsic::x86_avx2_vpdpwusds_128:
6223 case Intrinsic::x86_avx2_vpdpwusds_256:
6224 case Intrinsic::x86_avx10_vpdpwusds_512:
6225 case Intrinsic::x86_avx2_vpdpwuud_128:
6226 case Intrinsic::x86_avx2_vpdpwuud_256:
6227 case Intrinsic::x86_avx10_vpdpwuud_512:
6228 case Intrinsic::x86_avx2_vpdpwuuds_128:
6229 case Intrinsic::x86_avx2_vpdpwuuds_256:
6230 case Intrinsic::x86_avx10_vpdpwuuds_512:
6231 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6232 /*ZeroPurifies=*/true,
6233 /*EltSizeInBits=*/0,
6234 /*Lanes=*/kBothLanes);
6235 break;
6236
6237 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6238 // Precision
6239 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6240 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6241 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6242 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6243 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6244 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6245 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6246 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6247 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6248 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6249 /*ZeroPurifies=*/false,
6250 /*EltSizeInBits=*/0,
6251 /*Lanes=*/kBothLanes);
6252 break;
6253
6254 case Intrinsic::x86_sse_cmp_ss:
6255 case Intrinsic::x86_sse2_cmp_sd:
6256 case Intrinsic::x86_sse_comieq_ss:
6257 case Intrinsic::x86_sse_comilt_ss:
6258 case Intrinsic::x86_sse_comile_ss:
6259 case Intrinsic::x86_sse_comigt_ss:
6260 case Intrinsic::x86_sse_comige_ss:
6261 case Intrinsic::x86_sse_comineq_ss:
6262 case Intrinsic::x86_sse_ucomieq_ss:
6263 case Intrinsic::x86_sse_ucomilt_ss:
6264 case Intrinsic::x86_sse_ucomile_ss:
6265 case Intrinsic::x86_sse_ucomigt_ss:
6266 case Intrinsic::x86_sse_ucomige_ss:
6267 case Intrinsic::x86_sse_ucomineq_ss:
6268 case Intrinsic::x86_sse2_comieq_sd:
6269 case Intrinsic::x86_sse2_comilt_sd:
6270 case Intrinsic::x86_sse2_comile_sd:
6271 case Intrinsic::x86_sse2_comigt_sd:
6272 case Intrinsic::x86_sse2_comige_sd:
6273 case Intrinsic::x86_sse2_comineq_sd:
6274 case Intrinsic::x86_sse2_ucomieq_sd:
6275 case Intrinsic::x86_sse2_ucomilt_sd:
6276 case Intrinsic::x86_sse2_ucomile_sd:
6277 case Intrinsic::x86_sse2_ucomigt_sd:
6278 case Intrinsic::x86_sse2_ucomige_sd:
6279 case Intrinsic::x86_sse2_ucomineq_sd:
6280 handleVectorCompareScalarIntrinsic(I);
6281 break;
6282
6283 case Intrinsic::x86_avx_cmp_pd_256:
6284 case Intrinsic::x86_avx_cmp_ps_256:
6285 case Intrinsic::x86_sse2_cmp_pd:
6286 case Intrinsic::x86_sse_cmp_ps:
6287 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/true);
6288 break;
6289
6290 case Intrinsic::x86_bmi_bextr_32:
6291 case Intrinsic::x86_bmi_bextr_64:
6292 case Intrinsic::x86_bmi_bzhi_32:
6293 case Intrinsic::x86_bmi_bzhi_64:
6294 case Intrinsic::x86_bmi_pdep_32:
6295 case Intrinsic::x86_bmi_pdep_64:
6296 case Intrinsic::x86_bmi_pext_32:
6297 case Intrinsic::x86_bmi_pext_64:
6298 handleBmiIntrinsic(I);
6299 break;
6300
6301 case Intrinsic::x86_pclmulqdq:
6302 case Intrinsic::x86_pclmulqdq_256:
6303 case Intrinsic::x86_pclmulqdq_512:
6304 handlePclmulIntrinsic(I);
6305 break;
6306
6307 case Intrinsic::x86_avx_round_pd_256:
6308 case Intrinsic::x86_avx_round_ps_256:
6309 case Intrinsic::x86_sse41_round_pd:
6310 case Intrinsic::x86_sse41_round_ps:
6311 handleRoundPdPsIntrinsic(I);
6312 break;
6313
6314 case Intrinsic::x86_sse41_round_sd:
6315 case Intrinsic::x86_sse41_round_ss:
6316 handleUnarySdSsIntrinsic(I);
6317 break;
6318
6319 case Intrinsic::x86_sse2_max_sd:
6320 case Intrinsic::x86_sse_max_ss:
6321 case Intrinsic::x86_sse2_min_sd:
6322 case Intrinsic::x86_sse_min_ss:
6323 handleBinarySdSsIntrinsic(I);
6324 break;
6325
6326 case Intrinsic::x86_avx_vtestc_pd:
6327 case Intrinsic::x86_avx_vtestc_pd_256:
6328 case Intrinsic::x86_avx_vtestc_ps:
6329 case Intrinsic::x86_avx_vtestc_ps_256:
6330 case Intrinsic::x86_avx_vtestnzc_pd:
6331 case Intrinsic::x86_avx_vtestnzc_pd_256:
6332 case Intrinsic::x86_avx_vtestnzc_ps:
6333 case Intrinsic::x86_avx_vtestnzc_ps_256:
6334 case Intrinsic::x86_avx_vtestz_pd:
6335 case Intrinsic::x86_avx_vtestz_pd_256:
6336 case Intrinsic::x86_avx_vtestz_ps:
6337 case Intrinsic::x86_avx_vtestz_ps_256:
6338 case Intrinsic::x86_avx_ptestc_256:
6339 case Intrinsic::x86_avx_ptestnzc_256:
6340 case Intrinsic::x86_avx_ptestz_256:
6341 case Intrinsic::x86_sse41_ptestc:
6342 case Intrinsic::x86_sse41_ptestnzc:
6343 case Intrinsic::x86_sse41_ptestz:
6344 handleVtestIntrinsic(I);
6345 break;
6346
6347 // Packed Horizontal Add/Subtract
6348 case Intrinsic::x86_ssse3_phadd_w:
6349 case Intrinsic::x86_ssse3_phadd_w_128:
6350 case Intrinsic::x86_ssse3_phsub_w:
6351 case Intrinsic::x86_ssse3_phsub_w_128:
6352 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6353 /*ReinterpretElemWidth=*/16);
6354 break;
6355
6356 case Intrinsic::x86_avx2_phadd_w:
6357 case Intrinsic::x86_avx2_phsub_w:
6358 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6359 /*ReinterpretElemWidth=*/16);
6360 break;
6361
6362 // Packed Horizontal Add/Subtract
6363 case Intrinsic::x86_ssse3_phadd_d:
6364 case Intrinsic::x86_ssse3_phadd_d_128:
6365 case Intrinsic::x86_ssse3_phsub_d:
6366 case Intrinsic::x86_ssse3_phsub_d_128:
6367 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6368 /*ReinterpretElemWidth=*/32);
6369 break;
6370
6371 case Intrinsic::x86_avx2_phadd_d:
6372 case Intrinsic::x86_avx2_phsub_d:
6373 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6374 /*ReinterpretElemWidth=*/32);
6375 break;
6376
6377 // Packed Horizontal Add/Subtract and Saturate
6378 case Intrinsic::x86_ssse3_phadd_sw:
6379 case Intrinsic::x86_ssse3_phadd_sw_128:
6380 case Intrinsic::x86_ssse3_phsub_sw:
6381 case Intrinsic::x86_ssse3_phsub_sw_128:
6382 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6383 /*ReinterpretElemWidth=*/16);
6384 break;
6385
6386 case Intrinsic::x86_avx2_phadd_sw:
6387 case Intrinsic::x86_avx2_phsub_sw:
6388 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6389 /*ReinterpretElemWidth=*/16);
6390 break;
6391
6392 // Packed Single/Double Precision Floating-Point Horizontal Add
6393 case Intrinsic::x86_sse3_hadd_ps:
6394 case Intrinsic::x86_sse3_hadd_pd:
6395 case Intrinsic::x86_sse3_hsub_ps:
6396 case Intrinsic::x86_sse3_hsub_pd:
6397 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6398 break;
6399
6400 case Intrinsic::x86_avx_hadd_pd_256:
6401 case Intrinsic::x86_avx_hadd_ps_256:
6402 case Intrinsic::x86_avx_hsub_pd_256:
6403 case Intrinsic::x86_avx_hsub_ps_256:
6404 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6405 break;
6406
6407 case Intrinsic::x86_avx_maskstore_ps:
6408 case Intrinsic::x86_avx_maskstore_pd:
6409 case Intrinsic::x86_avx_maskstore_ps_256:
6410 case Intrinsic::x86_avx_maskstore_pd_256:
6411 case Intrinsic::x86_avx2_maskstore_d:
6412 case Intrinsic::x86_avx2_maskstore_q:
6413 case Intrinsic::x86_avx2_maskstore_d_256:
6414 case Intrinsic::x86_avx2_maskstore_q_256: {
6415 handleAVXMaskedStore(I);
6416 break;
6417 }
6418
6419 case Intrinsic::x86_avx_maskload_ps:
6420 case Intrinsic::x86_avx_maskload_pd:
6421 case Intrinsic::x86_avx_maskload_ps_256:
6422 case Intrinsic::x86_avx_maskload_pd_256:
6423 case Intrinsic::x86_avx2_maskload_d:
6424 case Intrinsic::x86_avx2_maskload_q:
6425 case Intrinsic::x86_avx2_maskload_d_256:
6426 case Intrinsic::x86_avx2_maskload_q_256: {
6427 handleAVXMaskedLoad(I);
6428 break;
6429 }
6430
6431 // Packed
6432 case Intrinsic::x86_avx512fp16_add_ph_512:
6433 case Intrinsic::x86_avx512fp16_sub_ph_512:
6434 case Intrinsic::x86_avx512fp16_mul_ph_512:
6435 case Intrinsic::x86_avx512fp16_div_ph_512:
6436 case Intrinsic::x86_avx512fp16_max_ph_512:
6437 case Intrinsic::x86_avx512fp16_min_ph_512:
6438 case Intrinsic::x86_avx512_min_ps_512:
6439 case Intrinsic::x86_avx512_min_pd_512:
6440 case Intrinsic::x86_avx512_max_ps_512:
6441 case Intrinsic::x86_avx512_max_pd_512: {
6442 // These AVX512 variants contain the rounding mode as a trailing flag.
6443 // Earlier variants do not have a trailing flag and are already handled
6444 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6445 // maybeHandleUnknownIntrinsic.
6446 [[maybe_unused]] bool Success =
6447 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6448 assert(Success);
6449 break;
6450 }
6451
6452 case Intrinsic::x86_avx_vpermilvar_pd:
6453 case Intrinsic::x86_avx_vpermilvar_pd_256:
6454 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6455 case Intrinsic::x86_avx_vpermilvar_ps:
6456 case Intrinsic::x86_avx_vpermilvar_ps_256:
6457 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6458 handleAVXVpermilvar(I);
6459 break;
6460 }
6461
6462 case Intrinsic::x86_avx512_vpermi2var_d_128:
6463 case Intrinsic::x86_avx512_vpermi2var_d_256:
6464 case Intrinsic::x86_avx512_vpermi2var_d_512:
6465 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6466 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6467 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6468 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6469 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6470 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6471 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6472 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6473 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6474 case Intrinsic::x86_avx512_vpermi2var_q_128:
6475 case Intrinsic::x86_avx512_vpermi2var_q_256:
6476 case Intrinsic::x86_avx512_vpermi2var_q_512:
6477 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6478 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6479 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6480 handleAVXVpermi2var(I);
6481 break;
6482
6483 // Packed Shuffle
6484 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6485 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6486 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6487 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6488 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6489 //
6490 // The following intrinsics are auto-upgraded:
6491 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6492 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6493 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6494 case Intrinsic::x86_avx2_pshuf_b:
6495 case Intrinsic::x86_sse_pshuf_w:
6496 case Intrinsic::x86_ssse3_pshuf_b_128:
6497 case Intrinsic::x86_ssse3_pshuf_b:
6498 case Intrinsic::x86_avx512_pshuf_b_512:
6499 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6500 /*trailingVerbatimArgs=*/1);
6501 break;
6502
6503 // AVX512 PMOV: Packed MOV, with truncation
6504 // Precisely handled by applying the same intrinsic to the shadow
6505 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6506 case Intrinsic::x86_avx512_mask_pmov_db_512:
6507 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6508 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6509 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6510 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6511 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6512 /*trailingVerbatimArgs=*/1);
6513 break;
6514 }
6515
6516 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6517 // Approximately handled using the corresponding truncation intrinsic
6518 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6519 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6520 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6521 handleIntrinsicByApplyingToShadow(I,
6522 Intrinsic::x86_avx512_mask_pmov_dw_512,
6523 /* trailingVerbatimArgs=*/1);
6524 break;
6525 }
6526
6527 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6528 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6529 handleIntrinsicByApplyingToShadow(I,
6530 Intrinsic::x86_avx512_mask_pmov_db_512,
6531 /* trailingVerbatimArgs=*/1);
6532 break;
6533 }
6534
6535 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6536 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6537 handleIntrinsicByApplyingToShadow(I,
6538 Intrinsic::x86_avx512_mask_pmov_qb_512,
6539 /* trailingVerbatimArgs=*/1);
6540 break;
6541 }
6542
6543 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6544 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6545 handleIntrinsicByApplyingToShadow(I,
6546 Intrinsic::x86_avx512_mask_pmov_qw_512,
6547 /* trailingVerbatimArgs=*/1);
6548 break;
6549 }
6550
6551 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6552 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6553 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6554 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6555 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6556 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6557 // slow-path handler.
6558 handleAVX512VectorDownConvert(I);
6559 break;
6560 }
6561
6562 // AVX512/AVX10 Reciprocal
6563 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6564 // (<16 x float>, <16 x float>, i16)
6565 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6566 // (<8 x float>, <8 x float>, i8)
6567 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6568 // (<4 x float>, <4 x float>, i8)
6569 //
6570 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6571 // (<8 x double>, <8 x double>, i8)
6572 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6573 // (<4 x double>, <4 x double>, i8)
6574 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6575 // (<2 x double>, <2 x double>, i8)
6576 //
6577 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6578 // (<32 x bfloat>, <32 x bfloat>, i32)
6579 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6580 // (<16 x bfloat>, <16 x bfloat>, i16)
6581 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6582 // (<8 x bfloat>, <8 x bfloat>, i8)
6583 //
6584 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6585 // (<32 x half>, <32 x half>, i32)
6586 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6587 // (<16 x half>, <16 x half>, i16)
6588 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6589 // (<8 x half>, <8 x half>, i8)
6590 //
6591 // TODO: 3-operand variants are not handled:
6592 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6593 // (<2 x double>, <2 x double>, <2 x double>, i8)
6594 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6595 // (<4 x float>, <4 x float>, <4 x float>, i8)
6596 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6597 // (<8 x half>, <8 x half>, <8 x half>, i8)
6598 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6599 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6600 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6601 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6602 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6603 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6604 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6605 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6606 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6607 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6608 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6609 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6610 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6611 /*MaskIndex=*/2);
6612 break;
6613
6614 // AVX512/AVX10 Reciprocal Square Root
6615 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6616 // (<16 x float>, <16 x float>, i16)
6617 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6618 // (<8 x float>, <8 x float>, i8)
6619 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6620 // (<4 x float>, <4 x float>, i8)
6621 //
6622 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6623 // (<8 x double>, <8 x double>, i8)
6624 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6625 // (<4 x double>, <4 x double>, i8)
6626 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6627 // (<2 x double>, <2 x double>, i8)
6628 //
6629 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6630 // (<32 x bfloat>, <32 x bfloat>, i32)
6631 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6632 // (<16 x bfloat>, <16 x bfloat>, i16)
6633 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6634 // (<8 x bfloat>, <8 x bfloat>, i8)
6635 //
6636 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6637 // (<32 x half>, <32 x half>, i32)
6638 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6639 // (<16 x half>, <16 x half>, i16)
6640 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6641 // (<8 x half>, <8 x half>, i8)
6642 //
6643 // TODO: 3-operand variants are not handled:
6644 // <2 x double> @llvm.x86.avx512.rcp14.sd
6645 // (<2 x double>, <2 x double>, <2 x double>, i8)
6646 // <4 x float> @llvm.x86.avx512.rcp14.ss
6647 // (<4 x float>, <4 x float>, <4 x float>, i8)
6648 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6649 // (<8 x half>, <8 x half>, <8 x half>, i8)
6650 case Intrinsic::x86_avx512_rcp14_ps_512:
6651 case Intrinsic::x86_avx512_rcp14_ps_256:
6652 case Intrinsic::x86_avx512_rcp14_ps_128:
6653 case Intrinsic::x86_avx512_rcp14_pd_512:
6654 case Intrinsic::x86_avx512_rcp14_pd_256:
6655 case Intrinsic::x86_avx512_rcp14_pd_128:
6656 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6657 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6658 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6659 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6660 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6661 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6662 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6663 /*MaskIndex=*/2);
6664 break;
6665
6666 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6667 // (<32 x half>, i32, <32 x half>, i32, i32)
6668 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6669 // (<16 x half>, i32, <16 x half>, i32, i16)
6670 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6671 // (<8 x half>, i32, <8 x half>, i32, i8)
6672 //
6673 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6674 // (<16 x float>, i32, <16 x float>, i16, i32)
6675 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6676 // (<8 x float>, i32, <8 x float>, i8)
6677 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6678 // (<4 x float>, i32, <4 x float>, i8)
6679 //
6680 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6681 // (<8 x double>, i32, <8 x double>, i8, i32)
6682 // A Imm WriteThru Mask Rounding
6683 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6684 // (<4 x double>, i32, <4 x double>, i8)
6685 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6686 // (<2 x double>, i32, <2 x double>, i8)
6687 // A Imm WriteThru Mask
6688 //
6689 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6690 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6691 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6692 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6693 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6694 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6695 //
6696 // Not supported: three vectors
6697 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6698 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6699 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6700 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6701 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6702 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6703 // i32)
6704 // A B WriteThru Mask Imm
6705 // Rounding
6706 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6707 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6708 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6709 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6710 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6711 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6712 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6713 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6714 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6715 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6716 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6717 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6718 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6719 /*MaskIndex=*/3);
6720 break;
6721
6722 // AVX512 FP16 Arithmetic
6723 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6724 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6725 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6726 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6727 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6728 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6729 visitGenericScalarHalfwordInst(I);
6730 break;
6731 }
6732
6733 // AVX Galois Field New Instructions
6734 case Intrinsic::x86_vgf2p8affineqb_128:
6735 case Intrinsic::x86_vgf2p8affineqb_256:
6736 case Intrinsic::x86_vgf2p8affineqb_512:
6737 handleAVXGF2P8Affine(I);
6738 break;
6739
6740 default:
6741 return false;
6742 }
6743
6744 return true;
6745 }
6746
6747 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6748 switch (I.getIntrinsicID()) {
6749 // Two operands e.g.,
6750 // - <8 x i8> @llvm.aarch64.neon.rshrn.v8i8 (<8 x i16>, i32)
6751 // - <4 x i16> @llvm.aarch64.neon.uqrshl.v4i16(<4 x i16>, <4 x i16>)
6752 case Intrinsic::aarch64_neon_rshrn:
6753 case Intrinsic::aarch64_neon_sqrshl:
6754 case Intrinsic::aarch64_neon_sqrshrn:
6755 case Intrinsic::aarch64_neon_sqrshrun:
6756 case Intrinsic::aarch64_neon_sqshl:
6757 case Intrinsic::aarch64_neon_sqshlu:
6758 case Intrinsic::aarch64_neon_sqshrn:
6759 case Intrinsic::aarch64_neon_sqshrun:
6760 case Intrinsic::aarch64_neon_srshl:
6761 case Intrinsic::aarch64_neon_sshl:
6762 case Intrinsic::aarch64_neon_uqrshl:
6763 case Intrinsic::aarch64_neon_uqrshrn:
6764 case Intrinsic::aarch64_neon_uqshl:
6765 case Intrinsic::aarch64_neon_uqshrn:
6766 case Intrinsic::aarch64_neon_urshl:
6767 case Intrinsic::aarch64_neon_ushl:
6768 handleVectorShiftIntrinsic(I, /* Variable */ false);
6769 break;
6770
6771 // Vector Shift Left/Right and Insert
6772 //
6773 // Three operands e.g.,
6774 // - <4 x i16> @llvm.aarch64.neon.vsli.v4i16
6775 // (<4 x i16> %a, <4 x i16> %b, i32 %n)
6776 // - <16 x i8> @llvm.aarch64.neon.vsri.v16i8
6777 // (<16 x i8> %a, <16 x i8> %b, i32 %n)
6778 //
6779 // %b is shifted by %n bits, and the "missing" bits are filled in with %a
6780 // (instead of zero-extending/sign-extending).
6781 case Intrinsic::aarch64_neon_vsli:
6782 case Intrinsic::aarch64_neon_vsri:
6783 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6784 /*trailingVerbatimArgs=*/1);
6785 break;
6786
6787 // TODO: handling max/min similarly to AND/OR may be more precise
6788 // Floating-Point Maximum/Minimum Pairwise
6789 case Intrinsic::aarch64_neon_fmaxp:
6790 case Intrinsic::aarch64_neon_fminp:
6791 // Floating-Point Maximum/Minimum Number Pairwise
6792 case Intrinsic::aarch64_neon_fmaxnmp:
6793 case Intrinsic::aarch64_neon_fminnmp:
6794 // Signed/Unsigned Maximum/Minimum Pairwise
6795 case Intrinsic::aarch64_neon_smaxp:
6796 case Intrinsic::aarch64_neon_sminp:
6797 case Intrinsic::aarch64_neon_umaxp:
6798 case Intrinsic::aarch64_neon_uminp:
6799 // Add Pairwise
6800 case Intrinsic::aarch64_neon_addp:
6801 // Floating-point Add Pairwise
6802 case Intrinsic::aarch64_neon_faddp:
6803 // Add Long Pairwise
6804 case Intrinsic::aarch64_neon_saddlp:
6805 case Intrinsic::aarch64_neon_uaddlp: {
6806 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6807 break;
6808 }
6809
6810 // Floating-point Convert to integer, rounding to nearest with ties to Away
6811 case Intrinsic::aarch64_neon_fcvtas:
6812 case Intrinsic::aarch64_neon_fcvtau:
6813 // Floating-point convert to integer, rounding toward minus infinity
6814 case Intrinsic::aarch64_neon_fcvtms:
6815 case Intrinsic::aarch64_neon_fcvtmu:
6816 // Floating-point convert to integer, rounding to nearest with ties to even
6817 case Intrinsic::aarch64_neon_fcvtns:
6818 case Intrinsic::aarch64_neon_fcvtnu:
6819 // Floating-point convert to integer, rounding toward plus infinity
6820 case Intrinsic::aarch64_neon_fcvtps:
6821 case Intrinsic::aarch64_neon_fcvtpu:
6822 // Floating-point Convert to integer, rounding toward Zero
6823 case Intrinsic::aarch64_neon_fcvtzs:
6824 case Intrinsic::aarch64_neon_fcvtzu:
6825 // Floating-point convert to lower precision narrow, rounding to odd
6826 case Intrinsic::aarch64_neon_fcvtxn:
6827 // Vector Conversions Between Half-Precision and Single-Precision
6828 case Intrinsic::aarch64_neon_vcvthf2fp:
6829 case Intrinsic::aarch64_neon_vcvtfp2hf:
6830 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/false);
6831 break;
6832
6833 // Vector Conversions Between Fixed-Point and Floating-Point
6834 case Intrinsic::aarch64_neon_vcvtfxs2fp:
6835 case Intrinsic::aarch64_neon_vcvtfp2fxs:
6836 case Intrinsic::aarch64_neon_vcvtfxu2fp:
6837 case Intrinsic::aarch64_neon_vcvtfp2fxu:
6838 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/true);
6839 break;
6840
6841 // TODO: bfloat conversions
6842 // - bfloat @llvm.aarch64.neon.bfcvt(float)
6843 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
6844 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
6845
6846 // Add reduction to scalar
6847 case Intrinsic::aarch64_neon_faddv:
6848 case Intrinsic::aarch64_neon_saddv:
6849 case Intrinsic::aarch64_neon_uaddv:
6850 // Signed/Unsigned min/max (Vector)
6851 // TODO: handling similarly to AND/OR may be more precise.
6852 case Intrinsic::aarch64_neon_smaxv:
6853 case Intrinsic::aarch64_neon_sminv:
6854 case Intrinsic::aarch64_neon_umaxv:
6855 case Intrinsic::aarch64_neon_uminv:
6856 // Floating-point min/max (vector)
6857 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6858 // but our shadow propagation is the same.
6859 case Intrinsic::aarch64_neon_fmaxv:
6860 case Intrinsic::aarch64_neon_fminv:
6861 case Intrinsic::aarch64_neon_fmaxnmv:
6862 case Intrinsic::aarch64_neon_fminnmv:
6863 // Sum long across vector
6864 case Intrinsic::aarch64_neon_saddlv:
6865 case Intrinsic::aarch64_neon_uaddlv:
6866 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6867 break;
6868
6869 case Intrinsic::aarch64_neon_ld1x2:
6870 case Intrinsic::aarch64_neon_ld1x3:
6871 case Intrinsic::aarch64_neon_ld1x4:
6872 case Intrinsic::aarch64_neon_ld2:
6873 case Intrinsic::aarch64_neon_ld3:
6874 case Intrinsic::aarch64_neon_ld4:
6875 case Intrinsic::aarch64_neon_ld2r:
6876 case Intrinsic::aarch64_neon_ld3r:
6877 case Intrinsic::aarch64_neon_ld4r: {
6878 handleNEONVectorLoad(I, /*WithLane=*/false);
6879 break;
6880 }
6881
6882 case Intrinsic::aarch64_neon_ld2lane:
6883 case Intrinsic::aarch64_neon_ld3lane:
6884 case Intrinsic::aarch64_neon_ld4lane: {
6885 handleNEONVectorLoad(I, /*WithLane=*/true);
6886 break;
6887 }
6888
6889 // Saturating extract narrow
6890 case Intrinsic::aarch64_neon_sqxtn:
6891 case Intrinsic::aarch64_neon_sqxtun:
6892 case Intrinsic::aarch64_neon_uqxtn:
6893 // These only have one argument, but we (ab)use handleShadowOr because it
6894 // does work on single argument intrinsics and will typecast the shadow
6895 // (and update the origin).
6896 handleShadowOr(I);
6897 break;
6898
6899 case Intrinsic::aarch64_neon_st1x2:
6900 case Intrinsic::aarch64_neon_st1x3:
6901 case Intrinsic::aarch64_neon_st1x4:
6902 case Intrinsic::aarch64_neon_st2:
6903 case Intrinsic::aarch64_neon_st3:
6904 case Intrinsic::aarch64_neon_st4: {
6905 handleNEONVectorStoreIntrinsic(I, false);
6906 break;
6907 }
6908
6909 case Intrinsic::aarch64_neon_st2lane:
6910 case Intrinsic::aarch64_neon_st3lane:
6911 case Intrinsic::aarch64_neon_st4lane: {
6912 handleNEONVectorStoreIntrinsic(I, true);
6913 break;
6914 }
6915
6916 // Arm NEON vector table intrinsics have the source/table register(s) as
6917 // arguments, followed by the index register. They return the output.
6918 //
6919 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6920 // original value unchanged in the destination register.'
6921 // Conveniently, zero denotes a clean shadow, which means out-of-range
6922 // indices for TBL will initialize the user data with zero and also clean
6923 // the shadow. (For TBX, neither the user data nor the shadow will be
6924 // updated, which is also correct.)
6925 case Intrinsic::aarch64_neon_tbl1:
6926 case Intrinsic::aarch64_neon_tbl2:
6927 case Intrinsic::aarch64_neon_tbl3:
6928 case Intrinsic::aarch64_neon_tbl4:
6929 case Intrinsic::aarch64_neon_tbx1:
6930 case Intrinsic::aarch64_neon_tbx2:
6931 case Intrinsic::aarch64_neon_tbx3:
6932 case Intrinsic::aarch64_neon_tbx4: {
6933 // The last trailing argument (index register) should be handled verbatim
6934 handleIntrinsicByApplyingToShadow(
6935 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6936 /*trailingVerbatimArgs*/ 1);
6937 break;
6938 }
6939
6940 case Intrinsic::aarch64_neon_fmulx:
6941 case Intrinsic::aarch64_neon_pmul:
6942 case Intrinsic::aarch64_neon_pmull:
6943 case Intrinsic::aarch64_neon_smull:
6944 case Intrinsic::aarch64_neon_pmull64:
6945 case Intrinsic::aarch64_neon_umull: {
6946 handleNEONVectorMultiplyIntrinsic(I);
6947 break;
6948 }
6949
6950 case Intrinsic::aarch64_neon_smmla:
6951 case Intrinsic::aarch64_neon_ummla:
6952 case Intrinsic::aarch64_neon_usmmla:
6953 handleNEONMatrixMultiply(I, /*ARows=*/2, /*ACols=*/8, /*BRows=*/8,
6954 /*BCols=*/2);
6955 break;
6956
6957 // <2 x i32> @llvm.aarch64.neon.{u,s,us}dot.v2i32.v8i8
6958 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
6959 // <4 x i32> @llvm.aarch64.neon.{u,s,us}dot.v4i32.v16i8
6960 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
6961 case Intrinsic::aarch64_neon_sdot:
6962 case Intrinsic::aarch64_neon_udot:
6963 case Intrinsic::aarch64_neon_usdot:
6964 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6965 /*ZeroPurifies=*/true,
6966 /*EltSizeInBits=*/0,
6967 /*Lanes=*/kBothLanes);
6968 break;
6969
6970 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
6971 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
6972 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
6973 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
6974 case Intrinsic::aarch64_neon_bfdot:
6975 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6976 /*ZeroPurifies=*/false,
6977 /*EltSizeInBits=*/0,
6978 /*Lanes=*/kBothLanes);
6979 break;
6980
6981 default:
6982 return false;
6983 }
6984
6985 return true;
6986 }
6987
6988 void visitIntrinsicInst(IntrinsicInst &I) {
6989 if (maybeHandleCrossPlatformIntrinsic(I))
6990 return;
6991
6992 if (maybeHandleX86SIMDIntrinsic(I))
6993 return;
6994
6995 if (maybeHandleArmSIMDIntrinsic(I))
6996 return;
6997
6998 if (maybeHandleUnknownIntrinsic(I))
6999 return;
7000
7001 visitInstruction(I);
7002 }
7003
7004 void visitLibAtomicLoad(CallBase &CB) {
7005 // Since we use getNextNode here, we can't have CB terminate the BB.
7006 assert(isa<CallInst>(CB));
7007
7008 IRBuilder<> IRB(&CB);
7009 Value *Size = CB.getArgOperand(0);
7010 Value *SrcPtr = CB.getArgOperand(1);
7011 Value *DstPtr = CB.getArgOperand(2);
7012 Value *Ordering = CB.getArgOperand(3);
7013 // Convert the call to have at least Acquire ordering to make sure
7014 // the shadow operations aren't reordered before it.
7015 Value *NewOrdering =
7016 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
7017 CB.setArgOperand(3, NewOrdering);
7018
7019 NextNodeIRBuilder NextIRB(&CB);
7020 Value *SrcShadowPtr, *SrcOriginPtr;
7021 std::tie(SrcShadowPtr, SrcOriginPtr) =
7022 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7023 /*isStore*/ false);
7024 Value *DstShadowPtr =
7025 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7026 /*isStore*/ true)
7027 .first;
7028
7029 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
7030 if (MS.TrackOrigins) {
7031 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
7033 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
7034 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
7035 }
7036 }
7037
7038 void visitLibAtomicStore(CallBase &CB) {
7039 IRBuilder<> IRB(&CB);
7040 Value *Size = CB.getArgOperand(0);
7041 Value *DstPtr = CB.getArgOperand(2);
7042 Value *Ordering = CB.getArgOperand(3);
7043 // Convert the call to have at least Release ordering to make sure
7044 // the shadow operations aren't reordered after it.
7045 Value *NewOrdering =
7046 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
7047 CB.setArgOperand(3, NewOrdering);
7048
7049 Value *DstShadowPtr =
7050 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
7051 /*isStore*/ true)
7052 .first;
7053
7054 // Atomic store always paints clean shadow/origin. See file header.
7055 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
7056 Align(1));
7057 }
7058
7059 void visitCallBase(CallBase &CB) {
7060 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
7061 if (CB.isInlineAsm()) {
7062 // For inline asm (either a call to asm function, or callbr instruction),
7063 // do the usual thing: check argument shadow and mark all outputs as
7064 // clean. Note that any side effects of the inline asm that are not
7065 // immediately visible in its constraints are not handled.
7067 visitAsmInstruction(CB);
7068 else
7069 visitInstruction(CB);
7070 return;
7071 }
7072 LibFunc LF;
7073 if (TLI->getLibFunc(CB, LF)) {
7074 // libatomic.a functions need to have special handling because there isn't
7075 // a good way to intercept them or compile the library with
7076 // instrumentation.
7077 switch (LF) {
7078 case LibFunc_atomic_load:
7079 if (!isa<CallInst>(CB)) {
7080 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
7081 "Ignoring!\n";
7082 break;
7083 }
7084 visitLibAtomicLoad(CB);
7085 return;
7086 case LibFunc_atomic_store:
7087 visitLibAtomicStore(CB);
7088 return;
7089 default:
7090 break;
7091 }
7092 }
7093
7094 if (auto *Call = dyn_cast<CallInst>(&CB)) {
7095 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7096
7097 // We are going to insert code that relies on the fact that the callee
7098 // will become a non-readonly function after it is instrumented by us. To
7099 // prevent this code from being optimized out, mark that function
7100 // non-readonly in advance.
7101 // TODO: We can likely do better than dropping memory() completely here.
7102 AttributeMask B;
7103 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
7104
7106 if (Function *Func = Call->getCalledFunction()) {
7107 Func->removeFnAttrs(B);
7108 }
7109
7111 }
7112 IRBuilder<> IRB(&CB);
7113 bool MayCheckCall = MS.EagerChecks;
7114 if (Function *Func = CB.getCalledFunction()) {
7115 // __sanitizer_unaligned_{load,store} functions may be called by users
7116 // and always expects shadows in the TLS. So don't check them.
7117 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
7118 }
7119
7120 unsigned ArgOffset = 0;
7121 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7122 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
7123 if (!A->getType()->isSized()) {
7124 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7125 continue;
7126 }
7127
7128 if (A->getType()->isScalableTy()) {
7129 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7130 // Handle as noundef, but don't reserve tls slots.
7131 insertCheckShadowOf(A, &CB);
7132 continue;
7133 }
7134
7135 unsigned Size = 0;
7136 const DataLayout &DL = F.getDataLayout();
7137
7138 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
7139 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
7140 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7141
7142 if (EagerCheck) {
7143 insertCheckShadowOf(A, &CB);
7144 Size = DL.getTypeAllocSize(A->getType());
7145 } else {
7146 [[maybe_unused]] Value *Store = nullptr;
7147 // Compute the Shadow for arg even if it is ByVal, because
7148 // in that case getShadow() will copy the actual arg shadow to
7149 // __msan_param_tls.
7150 Value *ArgShadow = getShadow(A);
7151 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7152 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7153 << " Shadow: " << *ArgShadow << "\n");
7154 if (ByVal) {
7155 // ByVal requires some special handling as it's too big for a single
7156 // load
7157 assert(A->getType()->isPointerTy() &&
7158 "ByVal argument is not a pointer!");
7159 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
7160 if (ArgOffset + Size > kParamTLSSize)
7161 break;
7162 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
7163 MaybeAlign Alignment = std::nullopt;
7164 if (ParamAlignment)
7165 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
7166 Value *AShadowPtr, *AOriginPtr;
7167 std::tie(AShadowPtr, AOriginPtr) =
7168 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
7169 /*isStore*/ false);
7170 if (!PropagateShadow) {
7171 Store = IRB.CreateMemSet(ArgShadowBase,
7173 Size, Alignment);
7174 } else {
7175 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
7176 Alignment, Size);
7177 if (MS.TrackOrigins) {
7178 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7179 // FIXME: OriginSize should be:
7180 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7181 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
7182 IRB.CreateMemCpy(
7183 ArgOriginBase,
7184 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
7185 AOriginPtr,
7186 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
7187 }
7188 }
7189 } else {
7190 // Any other parameters mean we need bit-grained tracking of uninit
7191 // data
7192 Size = DL.getTypeAllocSize(A->getType());
7193 if (ArgOffset + Size > kParamTLSSize)
7194 break;
7195 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
7197 Constant *Cst = dyn_cast<Constant>(ArgShadow);
7198 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7199 IRB.CreateStore(getOrigin(A),
7200 getOriginPtrForArgument(IRB, ArgOffset));
7201 }
7202 }
7203 assert(Store != nullptr);
7204 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7205 }
7206 assert(Size != 0);
7207 ArgOffset += alignTo(Size, kShadowTLSAlignment);
7208 }
7209 LLVM_DEBUG(dbgs() << " done with call args\n");
7210
7211 FunctionType *FT = CB.getFunctionType();
7212 if (FT->isVarArg()) {
7213 VAHelper->visitCallBase(CB, IRB);
7214 }
7215
7216 // Now, get the shadow for the RetVal.
7217 if (!CB.getType()->isSized())
7218 return;
7219 // Don't emit the epilogue for musttail call returns.
7220 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
7221 return;
7222
7223 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
7224 setShadow(&CB, getCleanShadow(&CB));
7225 setOrigin(&CB, getCleanOrigin());
7226 return;
7227 }
7228
7229 IRBuilder<> IRBBefore(&CB);
7230 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7231 Value *Base = getShadowPtrForRetval(IRBBefore);
7232 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
7234 BasicBlock::iterator NextInsn;
7235 if (isa<CallInst>(CB)) {
7236 NextInsn = ++CB.getIterator();
7237 assert(NextInsn != CB.getParent()->end());
7238 } else {
7239 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
7240 if (!NormalDest->getSinglePredecessor()) {
7241 // FIXME: this case is tricky, so we are just conservative here.
7242 // Perhaps we need to split the edge between this BB and NormalDest,
7243 // but a naive attempt to use SplitEdge leads to a crash.
7244 setShadow(&CB, getCleanShadow(&CB));
7245 setOrigin(&CB, getCleanOrigin());
7246 return;
7247 }
7248 // FIXME: NextInsn is likely in a basic block that has not been visited
7249 // yet. Anything inserted there will be instrumented by MSan later!
7250 NextInsn = NormalDest->getFirstInsertionPt();
7251 assert(NextInsn != NormalDest->end() &&
7252 "Could not find insertion point for retval shadow load");
7253 }
7254 IRBuilder<> IRBAfter(&*NextInsn);
7255 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7256 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7257 "_msret");
7258 setShadow(&CB, RetvalShadow);
7259 if (MS.TrackOrigins)
7260 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7261 }
7262
7263 bool isAMustTailRetVal(Value *RetVal) {
7264 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7265 RetVal = I->getOperand(0);
7266 }
7267 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7268 return I->isMustTailCall();
7269 }
7270 return false;
7271 }
7272
7273 void visitReturnInst(ReturnInst &I) {
7274 IRBuilder<> IRB(&I);
7275 Value *RetVal = I.getReturnValue();
7276 if (!RetVal)
7277 return;
7278 // Don't emit the epilogue for musttail call returns.
7279 if (isAMustTailRetVal(RetVal))
7280 return;
7281 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7282 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7283 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7284 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7285 // must always return fully initialized values. For now, we hardcode "main".
7286 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7287
7288 Value *Shadow = getShadow(RetVal);
7289 bool StoreOrigin = true;
7290 if (EagerCheck) {
7291 insertCheckShadowOf(RetVal, &I);
7292 Shadow = getCleanShadow(RetVal);
7293 StoreOrigin = false;
7294 }
7295
7296 // The caller may still expect information passed over TLS if we pass our
7297 // check
7298 if (StoreShadow) {
7299 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7300 if (MS.TrackOrigins && StoreOrigin)
7301 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7302 }
7303 }
7304
7305 void visitPHINode(PHINode &I) {
7306 IRBuilder<> IRB(&I);
7307 if (!PropagateShadow) {
7308 setShadow(&I, getCleanShadow(&I));
7309 setOrigin(&I, getCleanOrigin());
7310 return;
7311 }
7312
7313 ShadowPHINodes.push_back(&I);
7314 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7315 "_msphi_s"));
7316 if (MS.TrackOrigins)
7317 setOrigin(
7318 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7319 }
7320
7321 Value *getLocalVarIdptr(AllocaInst &I) {
7322 ConstantInt *IntConst =
7323 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7324 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7325 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7326 IntConst);
7327 }
7328
7329 Value *getLocalVarDescription(AllocaInst &I) {
7330 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7331 }
7332
7333 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7334 if (PoisonStack && ClPoisonStackWithCall) {
7335 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7336 } else {
7337 Value *ShadowBase, *OriginBase;
7338 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7339 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7340
7341 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7342 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7343 }
7344
7345 if (PoisonStack && MS.TrackOrigins) {
7346 Value *Idptr = getLocalVarIdptr(I);
7347 if (ClPrintStackNames) {
7348 Value *Descr = getLocalVarDescription(I);
7349 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7350 {&I, Len, Idptr, Descr});
7351 } else {
7352 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7353 }
7354 }
7355 }
7356
7357 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7358 Value *Descr = getLocalVarDescription(I);
7359 if (PoisonStack) {
7360 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7361 } else {
7362 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7363 }
7364 }
7365
7366 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7367 if (!InsPoint)
7368 InsPoint = &I;
7369 NextNodeIRBuilder IRB(InsPoint);
7370 Value *Len = IRB.CreateAllocationSize(MS.IntptrTy, &I);
7371
7372 if (MS.CompileKernel)
7373 poisonAllocaKmsan(I, IRB, Len);
7374 else
7375 poisonAllocaUserspace(I, IRB, Len);
7376 }
7377
7378 void visitAllocaInst(AllocaInst &I) {
7379 setShadow(&I, getCleanShadow(&I));
7380 setOrigin(&I, getCleanOrigin());
7381 // We'll get to this alloca later unless it's poisoned at the corresponding
7382 // llvm.lifetime.start.
7383 AllocaSet.insert(&I);
7384 }
7385
7386 void visitSelectInst(SelectInst &I) {
7387 // a = select b, c, d
7388 Value *B = I.getCondition();
7389 Value *C = I.getTrueValue();
7390 Value *D = I.getFalseValue();
7391
7392 handleSelectLikeInst(I, B, C, D);
7393 }
7394
7395 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7396 IRBuilder<> IRB(&I);
7397
7398 Value *Sb = getShadow(B);
7399 Value *Sc = getShadow(C);
7400 Value *Sd = getShadow(D);
7401
7402 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7403 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7404 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7405
7406 // Result shadow if condition shadow is 0.
7407 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7408 Value *Sa1;
7409 if (I.getType()->isAggregateType()) {
7410 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7411 // an extra "select". This results in much more compact IR.
7412 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7413 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7414 } else if (isScalableNonVectorType(I.getType())) {
7415 // This is intended to handle target("aarch64.svcount"), which can't be
7416 // handled in the else branch because of incompatibility with CreateXor
7417 // ("The supported LLVM operations on this type are limited to load,
7418 // store, phi, select and alloca instructions").
7419
7420 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7421 // branch as needed instead.
7422 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7423 } else {
7424 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7425 // If Sb (condition is poisoned), look for bits in c and d that are equal
7426 // and both unpoisoned.
7427 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7428
7429 // Cast arguments to shadow-compatible type.
7430 C = CreateAppToShadowCast(IRB, C);
7431 D = CreateAppToShadowCast(IRB, D);
7432
7433 // Result shadow if condition shadow is 1.
7434 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7435 }
7436 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7437 setShadow(&I, Sa);
7438 if (MS.TrackOrigins) {
7439 // Origins are always i32, so any vector conditions must be flattened.
7440 // FIXME: consider tracking vector origins for app vectors?
7441 if (B->getType()->isVectorTy()) {
7442 B = convertToBool(B, IRB);
7443 Sb = convertToBool(Sb, IRB);
7444 }
7445 // a = select b, c, d
7446 // Oa = Sb ? Ob : (b ? Oc : Od)
7447 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7448 }
7449 }
7450
7451 void visitLandingPadInst(LandingPadInst &I) {
7452 // Do nothing.
7453 // See https://github.com/google/sanitizers/issues/504
7454 setShadow(&I, getCleanShadow(&I));
7455 setOrigin(&I, getCleanOrigin());
7456 }
7457
7458 void visitCatchSwitchInst(CatchSwitchInst &I) {
7459 setShadow(&I, getCleanShadow(&I));
7460 setOrigin(&I, getCleanOrigin());
7461 }
7462
7463 void visitFuncletPadInst(FuncletPadInst &I) {
7464 setShadow(&I, getCleanShadow(&I));
7465 setOrigin(&I, getCleanOrigin());
7466 }
7467
7468 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7469
7470 void visitExtractValueInst(ExtractValueInst &I) {
7471 IRBuilder<> IRB(&I);
7472 Value *Agg = I.getAggregateOperand();
7473 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7474 Value *AggShadow = getShadow(Agg);
7475 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7476 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7477 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7478 setShadow(&I, ResShadow);
7479 setOriginForNaryOp(I);
7480 }
7481
7482 void visitInsertValueInst(InsertValueInst &I) {
7483 IRBuilder<> IRB(&I);
7484 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7485 Value *AggShadow = getShadow(I.getAggregateOperand());
7486 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7487 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7488 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7489 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7490 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7491 setShadow(&I, Res);
7492 setOriginForNaryOp(I);
7493 }
7494
7495 void dumpInst(Instruction &I) {
7496 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7497 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7498 } else {
7499 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7500 }
7501 errs() << "QQQ " << I << "\n";
7502 }
7503
7504 void visitResumeInst(ResumeInst &I) {
7505 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7506 // Nothing to do here.
7507 }
7508
7509 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7510 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7511 // Nothing to do here.
7512 }
7513
7514 void visitCatchReturnInst(CatchReturnInst &CRI) {
7515 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7516 // Nothing to do here.
7517 }
7518
7519 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7520 IRBuilder<> &IRB, const DataLayout &DL,
7521 bool isOutput) {
7522 // For each assembly argument, we check its value for being initialized.
7523 // If the argument is a pointer, we assume it points to a single element
7524 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7525 // Each such pointer is instrumented with a call to the runtime library.
7526 Type *OpType = Operand->getType();
7527 // Check the operand value itself.
7528 insertCheckShadowOf(Operand, &I);
7529 if (!OpType->isPointerTy() || !isOutput) {
7530 assert(!isOutput);
7531 return;
7532 }
7533 if (!ElemTy->isSized())
7534 return;
7535 auto Size = DL.getTypeStoreSize(ElemTy);
7536 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7537 if (MS.CompileKernel) {
7538 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7539 } else {
7540 // ElemTy, derived from elementtype(), does not encode the alignment of
7541 // the pointer. Conservatively assume that the shadow memory is unaligned.
7542 // When Size is large, avoid StoreInst as it would expand to many
7543 // instructions.
7544 auto [ShadowPtr, _] =
7545 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7546 if (Size <= 32)
7547 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7548 else
7549 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7550 SizeVal, Align(1));
7551 }
7552 }
7553
7554 /// Get the number of output arguments returned by pointers.
7555 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7556 int NumRetOutputs = 0;
7557 int NumOutputs = 0;
7558 Type *RetTy = cast<Value>(CB)->getType();
7559 if (!RetTy->isVoidTy()) {
7560 // Register outputs are returned via the CallInst return value.
7561 auto *ST = dyn_cast<StructType>(RetTy);
7562 if (ST)
7563 NumRetOutputs = ST->getNumElements();
7564 else
7565 NumRetOutputs = 1;
7566 }
7567 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7568 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7569 switch (Info.Type) {
7571 NumOutputs++;
7572 break;
7573 default:
7574 break;
7575 }
7576 }
7577 return NumOutputs - NumRetOutputs;
7578 }
7579
7580 void visitAsmInstruction(Instruction &I) {
7581 // Conservative inline assembly handling: check for poisoned shadow of
7582 // asm() arguments, then unpoison the result and all the memory locations
7583 // pointed to by those arguments.
7584 // An inline asm() statement in C++ contains lists of input and output
7585 // arguments used by the assembly code. These are mapped to operands of the
7586 // CallInst as follows:
7587 // - nR register outputs ("=r) are returned by value in a single structure
7588 // (SSA value of the CallInst);
7589 // - nO other outputs ("=m" and others) are returned by pointer as first
7590 // nO operands of the CallInst;
7591 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7592 // remaining nI operands.
7593 // The total number of asm() arguments in the source is nR+nO+nI, and the
7594 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7595 // function to be called).
7596 const DataLayout &DL = F.getDataLayout();
7597 CallBase *CB = cast<CallBase>(&I);
7598 IRBuilder<> IRB(&I);
7599 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7600 int OutputArgs = getNumOutputArgs(IA, CB);
7601 // The last operand of a CallInst is the function itself.
7602 int NumOperands = CB->getNumOperands() - 1;
7603
7604 // Check input arguments. Doing so before unpoisoning output arguments, so
7605 // that we won't overwrite uninit values before checking them.
7606 for (int i = OutputArgs; i < NumOperands; i++) {
7607 Value *Operand = CB->getOperand(i);
7608 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7609 /*isOutput*/ false);
7610 }
7611 // Unpoison output arguments. This must happen before the actual InlineAsm
7612 // call, so that the shadow for memory published in the asm() statement
7613 // remains valid.
7614 for (int i = 0; i < OutputArgs; i++) {
7615 Value *Operand = CB->getOperand(i);
7616 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7617 /*isOutput*/ true);
7618 }
7619
7620 setShadow(&I, getCleanShadow(&I));
7621 setOrigin(&I, getCleanOrigin());
7622 }
7623
7624 void visitFreezeInst(FreezeInst &I) {
7625 // Freeze always returns a fully defined value.
7626 setShadow(&I, getCleanShadow(&I));
7627 setOrigin(&I, getCleanOrigin());
7628 }
7629
7630 void visitInstruction(Instruction &I) {
7631 // Everything else: stop propagating and check for poisoned shadow.
7633 dumpInst(I);
7634 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7635 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7636 Value *Operand = I.getOperand(i);
7637 if (Operand->getType()->isSized())
7638 insertCheckShadowOf(Operand, &I);
7639 }
7640 setShadow(&I, getCleanShadow(&I));
7641 setOrigin(&I, getCleanOrigin());
7642 }
7643};
7644
7645struct VarArgHelperBase : public VarArgHelper {
7646 Function &F;
7647 MemorySanitizer &MS;
7648 MemorySanitizerVisitor &MSV;
7649 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7650 const unsigned VAListTagSize;
7651
7652 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7653 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7654 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7655
7656 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7657 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7658 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7659 }
7660
7661 /// Compute the shadow address for a given va_arg.
7662 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7663 return IRB.CreatePtrAdd(
7664 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7665 }
7666
7667 /// Compute the shadow address for a given va_arg.
7668 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7669 unsigned ArgSize) {
7670 // Make sure we don't overflow __msan_va_arg_tls.
7671 if (ArgOffset + ArgSize > kParamTLSSize)
7672 return nullptr;
7673 return getShadowPtrForVAArgument(IRB, ArgOffset);
7674 }
7675
7676 /// Compute the origin address for a given va_arg.
7677 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7678 // getOriginPtrForVAArgument() is always called after
7679 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7680 // overflow.
7681 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7682 ConstantInt::get(MS.IntptrTy, ArgOffset),
7683 "_msarg_va_o");
7684 }
7685
7686 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7687 unsigned BaseOffset) {
7688 // The tails of __msan_va_arg_tls is not large enough to fit full
7689 // value shadow, but it will be copied to backup anyway. Make it
7690 // clean.
7691 if (BaseOffset >= kParamTLSSize)
7692 return;
7693 Value *TailSize =
7694 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7695 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7696 TailSize, Align(8));
7697 }
7698
7699 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7700 IRBuilder<> IRB(&I);
7701 Value *VAListTag = I.getArgOperand(0);
7702 const Align Alignment = Align(8);
7703 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7704 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7705 // Unpoison the whole __va_list_tag.
7706 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7707 VAListTagSize, Alignment, false);
7708 }
7709
7710 void visitVAStartInst(VAStartInst &I) override {
7711 if (F.getCallingConv() == CallingConv::Win64)
7712 return;
7713 VAStartInstrumentationList.push_back(&I);
7714 unpoisonVAListTagForInst(I);
7715 }
7716
7717 void visitVACopyInst(VACopyInst &I) override {
7718 if (F.getCallingConv() == CallingConv::Win64)
7719 return;
7720 unpoisonVAListTagForInst(I);
7721 }
7722};
7723
7724/// AMD64-specific implementation of VarArgHelper.
7725struct VarArgAMD64Helper : public VarArgHelperBase {
7726 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7727 // See a comment in visitCallBase for more details.
7728 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7729 static const unsigned AMD64FpEndOffsetSSE = 176;
7730 // If SSE is disabled, fp_offset in va_list is zero.
7731 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7732
7733 unsigned AMD64FpEndOffset;
7734 AllocaInst *VAArgTLSCopy = nullptr;
7735 AllocaInst *VAArgTLSOriginCopy = nullptr;
7736 Value *VAArgOverflowSize = nullptr;
7737
7738 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7739
7740 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7741 MemorySanitizerVisitor &MSV)
7742 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7743 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7744 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7745 if (Attr.isStringAttribute() &&
7746 (Attr.getKindAsString() == "target-features")) {
7747 if (Attr.getValueAsString().contains("-sse"))
7748 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7749 break;
7750 }
7751 }
7752 }
7753
7754 ArgKind classifyArgument(Value *arg) {
7755 // A very rough approximation of X86_64 argument classification rules.
7756 Type *T = arg->getType();
7757 if (T->isX86_FP80Ty())
7758 return AK_Memory;
7759 if (T->isFPOrFPVectorTy())
7760 return AK_FloatingPoint;
7761 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7762 return AK_GeneralPurpose;
7763 if (T->isPointerTy())
7764 return AK_GeneralPurpose;
7765 return AK_Memory;
7766 }
7767
7768 // For VarArg functions, store the argument shadow in an ABI-specific format
7769 // that corresponds to va_list layout.
7770 // We do this because Clang lowers va_arg in the frontend, and this pass
7771 // only sees the low level code that deals with va_list internals.
7772 // A much easier alternative (provided that Clang emits va_arg instructions)
7773 // would have been to associate each live instance of va_list with a copy of
7774 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7775 // order.
7776 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7777 unsigned GpOffset = 0;
7778 unsigned FpOffset = AMD64GpEndOffset;
7779 unsigned OverflowOffset = AMD64FpEndOffset;
7780 const DataLayout &DL = F.getDataLayout();
7781
7782 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7783 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7784 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7785 if (IsByVal) {
7786 // ByVal arguments always go to the overflow area.
7787 // Fixed arguments passed through the overflow area will be stepped
7788 // over by va_start, so don't count them towards the offset.
7789 if (IsFixed)
7790 continue;
7791 assert(A->getType()->isPointerTy());
7792 Type *RealTy = CB.getParamByValType(ArgNo);
7793 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7794 uint64_t AlignedSize = alignTo(ArgSize, 8);
7795 unsigned BaseOffset = OverflowOffset;
7796 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7797 Value *OriginBase = nullptr;
7798 if (MS.TrackOrigins)
7799 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7800 OverflowOffset += AlignedSize;
7801
7802 if (OverflowOffset > kParamTLSSize) {
7803 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7804 continue; // We have no space to copy shadow there.
7805 }
7806
7807 Value *ShadowPtr, *OriginPtr;
7808 std::tie(ShadowPtr, OriginPtr) =
7809 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7810 /*isStore*/ false);
7811 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7812 kShadowTLSAlignment, ArgSize);
7813 if (MS.TrackOrigins)
7814 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7815 kShadowTLSAlignment, ArgSize);
7816 } else {
7817 ArgKind AK = classifyArgument(A);
7818 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7819 AK = AK_Memory;
7820 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7821 AK = AK_Memory;
7822 Value *ShadowBase, *OriginBase = nullptr;
7823 switch (AK) {
7824 case AK_GeneralPurpose:
7825 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7826 if (MS.TrackOrigins)
7827 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7828 GpOffset += 8;
7829 assert(GpOffset <= kParamTLSSize);
7830 break;
7831 case AK_FloatingPoint:
7832 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7833 if (MS.TrackOrigins)
7834 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7835 FpOffset += 16;
7836 assert(FpOffset <= kParamTLSSize);
7837 break;
7838 case AK_Memory:
7839 if (IsFixed)
7840 continue;
7841 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7842 uint64_t AlignedSize = alignTo(ArgSize, 8);
7843 unsigned BaseOffset = OverflowOffset;
7844 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7845 if (MS.TrackOrigins) {
7846 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7847 }
7848 OverflowOffset += AlignedSize;
7849 if (OverflowOffset > kParamTLSSize) {
7850 // We have no space to copy shadow there.
7851 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7852 continue;
7853 }
7854 }
7855 // Take fixed arguments into account for GpOffset and FpOffset,
7856 // but don't actually store shadows for them.
7857 // TODO(glider): don't call get*PtrForVAArgument() for them.
7858 if (IsFixed)
7859 continue;
7860 Value *Shadow = MSV.getShadow(A);
7861 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7862 if (MS.TrackOrigins) {
7863 Value *Origin = MSV.getOrigin(A);
7864 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7865 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7867 }
7868 }
7869 }
7870 Constant *OverflowSize =
7871 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7872 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7873 }
7874
7875 void finalizeInstrumentation() override {
7876 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7877 "finalizeInstrumentation called twice");
7878 if (!VAStartInstrumentationList.empty()) {
7879 // If there is a va_start in this function, make a backup copy of
7880 // va_arg_tls somewhere in the function entry block.
7881 IRBuilder<> IRB(MSV.FnPrologueEnd);
7882 VAArgOverflowSize =
7883 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7884 Value *CopySize = IRB.CreateAdd(
7885 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7886 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7887 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7888 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7889 CopySize, kShadowTLSAlignment, false);
7890
7891 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7892 Intrinsic::umin, CopySize,
7893 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7894 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7895 kShadowTLSAlignment, SrcSize);
7896 if (MS.TrackOrigins) {
7897 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7898 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7899 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7900 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7901 }
7902 }
7903
7904 // Instrument va_start.
7905 // Copy va_list shadow from the backup copy of the TLS contents.
7906 for (CallInst *OrigInst : VAStartInstrumentationList) {
7907 NextNodeIRBuilder IRB(OrigInst);
7908 Value *VAListTag = OrigInst->getArgOperand(0);
7909
7910 Value *RegSaveAreaPtrPtr =
7911 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7912 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7913 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7914 const Align Alignment = Align(16);
7915 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7916 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7917 Alignment, /*isStore*/ true);
7918 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7919 AMD64FpEndOffset);
7920 if (MS.TrackOrigins)
7921 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7922 Alignment, AMD64FpEndOffset);
7923 Value *OverflowArgAreaPtrPtr =
7924 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7925 Value *OverflowArgAreaPtr =
7926 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7927 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7928 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7929 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7930 Alignment, /*isStore*/ true);
7931 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7932 AMD64FpEndOffset);
7933 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7934 VAArgOverflowSize);
7935 if (MS.TrackOrigins) {
7936 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7937 AMD64FpEndOffset);
7938 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7939 VAArgOverflowSize);
7940 }
7941 }
7942 }
7943};
7944
7945/// AArch64-specific implementation of VarArgHelper.
7946struct VarArgAArch64Helper : public VarArgHelperBase {
7947 static const unsigned kAArch64GrArgSize = 64;
7948 static const unsigned kAArch64VrArgSize = 128;
7949
7950 static const unsigned AArch64GrBegOffset = 0;
7951 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7952 // Make VR space aligned to 16 bytes.
7953 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7954 static const unsigned AArch64VrEndOffset =
7955 AArch64VrBegOffset + kAArch64VrArgSize;
7956 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7957
7958 AllocaInst *VAArgTLSCopy = nullptr;
7959 Value *VAArgOverflowSize = nullptr;
7960
7961 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7962
7963 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7964 MemorySanitizerVisitor &MSV)
7965 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7966
7967 // A very rough approximation of aarch64 argument classification rules.
7968 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7969 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7970 return {AK_GeneralPurpose, 1};
7971 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7972 return {AK_FloatingPoint, 1};
7973
7974 if (T->isArrayTy()) {
7975 auto R = classifyArgument(T->getArrayElementType());
7976 R.second *= T->getScalarType()->getArrayNumElements();
7977 return R;
7978 }
7979
7980 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7981 auto R = classifyArgument(FV->getScalarType());
7982 R.second *= FV->getNumElements();
7983 return R;
7984 }
7985
7986 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7987 return {AK_Memory, 0};
7988 }
7989
7990 // The instrumentation stores the argument shadow in a non ABI-specific
7991 // format because it does not know which argument is named (since Clang,
7992 // like x86_64 case, lowers the va_args in the frontend and this pass only
7993 // sees the low level code that deals with va_list internals).
7994 // The first seven GR registers are saved in the first 56 bytes of the
7995 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7996 // the remaining arguments.
7997 // Using constant offset within the va_arg TLS array allows fast copy
7998 // in the finalize instrumentation.
7999 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8000 unsigned GrOffset = AArch64GrBegOffset;
8001 unsigned VrOffset = AArch64VrBegOffset;
8002 unsigned OverflowOffset = AArch64VAEndOffset;
8003
8004 const DataLayout &DL = F.getDataLayout();
8005 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8006 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8007 auto [AK, RegNum] = classifyArgument(A->getType());
8008 if (AK == AK_GeneralPurpose &&
8009 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
8010 AK = AK_Memory;
8011 if (AK == AK_FloatingPoint &&
8012 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
8013 AK = AK_Memory;
8014 Value *Base;
8015 switch (AK) {
8016 case AK_GeneralPurpose:
8017 Base = getShadowPtrForVAArgument(IRB, GrOffset);
8018 GrOffset += 8 * RegNum;
8019 break;
8020 case AK_FloatingPoint:
8021 Base = getShadowPtrForVAArgument(IRB, VrOffset);
8022 VrOffset += 16 * RegNum;
8023 break;
8024 case AK_Memory:
8025 // Don't count fixed arguments in the overflow area - va_start will
8026 // skip right over them.
8027 if (IsFixed)
8028 continue;
8029 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8030 uint64_t AlignedSize = alignTo(ArgSize, 8);
8031 unsigned BaseOffset = OverflowOffset;
8032 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
8033 OverflowOffset += AlignedSize;
8034 if (OverflowOffset > kParamTLSSize) {
8035 // We have no space to copy shadow there.
8036 CleanUnusedTLS(IRB, Base, BaseOffset);
8037 continue;
8038 }
8039 break;
8040 }
8041 // Count Gp/Vr fixed arguments to their respective offsets, but don't
8042 // bother to actually store a shadow.
8043 if (IsFixed)
8044 continue;
8045 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8046 }
8047 Constant *OverflowSize =
8048 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
8049 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8050 }
8051
8052 // Retrieve a va_list field of 'void*' size.
8053 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8054 Value *SaveAreaPtrPtr =
8055 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8056 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
8057 }
8058
8059 // Retrieve a va_list field of 'int' size.
8060 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8061 Value *SaveAreaPtr =
8062 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8063 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
8064 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
8065 }
8066
8067 void finalizeInstrumentation() override {
8068 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8069 "finalizeInstrumentation called twice");
8070 if (!VAStartInstrumentationList.empty()) {
8071 // If there is a va_start in this function, make a backup copy of
8072 // va_arg_tls somewhere in the function entry block.
8073 IRBuilder<> IRB(MSV.FnPrologueEnd);
8074 VAArgOverflowSize =
8075 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8076 Value *CopySize = IRB.CreateAdd(
8077 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
8078 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8079 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8080 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8081 CopySize, kShadowTLSAlignment, false);
8082
8083 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8084 Intrinsic::umin, CopySize,
8085 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8086 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8087 kShadowTLSAlignment, SrcSize);
8088 }
8089
8090 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
8091 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
8092
8093 // Instrument va_start, copy va_list shadow from the backup copy of
8094 // the TLS contents.
8095 for (CallInst *OrigInst : VAStartInstrumentationList) {
8096 NextNodeIRBuilder IRB(OrigInst);
8097
8098 Value *VAListTag = OrigInst->getArgOperand(0);
8099
8100 // The variadic ABI for AArch64 creates two areas to save the incoming
8101 // argument registers (one for 64-bit general register xn-x7 and another
8102 // for 128-bit FP/SIMD vn-v7).
8103 // We need then to propagate the shadow arguments on both regions
8104 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8105 // The remaining arguments are saved on shadow for 'va::stack'.
8106 // One caveat is it requires only to propagate the non-named arguments,
8107 // however on the call site instrumentation 'all' the arguments are
8108 // saved. So to copy the shadow values from the va_arg TLS array
8109 // we need to adjust the offset for both GR and VR fields based on
8110 // the __{gr,vr}_offs value (since they are stores based on incoming
8111 // named arguments).
8112 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8113
8114 // Read the stack pointer from the va_list.
8115 Value *StackSaveAreaPtr =
8116 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
8117
8118 // Read both the __gr_top and __gr_off and add them up.
8119 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
8120 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
8121
8122 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8123 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
8124
8125 // Read both the __vr_top and __vr_off and add them up.
8126 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
8127 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
8128
8129 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8130 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
8131
8132 // It does not know how many named arguments is being used and, on the
8133 // callsite all the arguments were saved. Since __gr_off is defined as
8134 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8135 // argument by ignoring the bytes of shadow from named arguments.
8136 Value *GrRegSaveAreaShadowPtrOff =
8137 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
8138
8139 Value *GrRegSaveAreaShadowPtr =
8140 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8141 Align(8), /*isStore*/ true)
8142 .first;
8143
8144 Value *GrSrcPtr =
8145 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
8146 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
8147
8148 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
8149 GrCopySize);
8150
8151 // Again, but for FP/SIMD values.
8152 Value *VrRegSaveAreaShadowPtrOff =
8153 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
8154
8155 Value *VrRegSaveAreaShadowPtr =
8156 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8157 Align(8), /*isStore*/ true)
8158 .first;
8159
8160 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8161 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
8162 IRB.getInt32(AArch64VrBegOffset)),
8163 VrRegSaveAreaShadowPtrOff);
8164 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
8165
8166 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
8167 VrCopySize);
8168
8169 // And finally for remaining arguments.
8170 Value *StackSaveAreaShadowPtr =
8171 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
8172 Align(16), /*isStore*/ true)
8173 .first;
8174
8175 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8176 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
8177
8178 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
8179 Align(16), VAArgOverflowSize);
8180 }
8181 }
8182};
8183
8184/// PowerPC64-specific implementation of VarArgHelper.
8185struct VarArgPowerPC64Helper : public VarArgHelperBase {
8186 AllocaInst *VAArgTLSCopy = nullptr;
8187 Value *VAArgSize = nullptr;
8188
8189 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8190 MemorySanitizerVisitor &MSV)
8191 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8192
8193 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8194 // For PowerPC, we need to deal with alignment of stack arguments -
8195 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8196 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8197 // For that reason, we compute current offset from stack pointer (which is
8198 // always properly aligned), and offset for the first vararg, then subtract
8199 // them.
8200 unsigned VAArgBase;
8201 Triple TargetTriple(F.getParent()->getTargetTriple());
8202 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8203 // and 32 bytes for ABIv2. This is usually determined by target
8204 // endianness, but in theory could be overridden by function attribute.
8205 if (TargetTriple.isPPC64ELFv2ABI())
8206 VAArgBase = 32;
8207 else
8208 VAArgBase = 48;
8209 unsigned VAArgOffset = VAArgBase;
8210 const DataLayout &DL = F.getDataLayout();
8211 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8212 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8213 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8214 if (IsByVal) {
8215 assert(A->getType()->isPointerTy());
8216 Type *RealTy = CB.getParamByValType(ArgNo);
8217 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8218 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
8219 if (ArgAlign < 8)
8220 ArgAlign = Align(8);
8221 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8222 if (!IsFixed) {
8223 Value *Base =
8224 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8225 if (Base) {
8226 Value *AShadowPtr, *AOriginPtr;
8227 std::tie(AShadowPtr, AOriginPtr) =
8228 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8229 kShadowTLSAlignment, /*isStore*/ false);
8230
8231 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8232 kShadowTLSAlignment, ArgSize);
8233 }
8234 }
8235 VAArgOffset += alignTo(ArgSize, Align(8));
8236 } else {
8237 Value *Base;
8238 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8239 Align ArgAlign = Align(8);
8240 if (A->getType()->isArrayTy()) {
8241 // Arrays are aligned to element size, except for long double
8242 // arrays, which are aligned to 8 bytes.
8243 Type *ElementTy = A->getType()->getArrayElementType();
8244 if (!ElementTy->isPPC_FP128Ty())
8245 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8246 } else if (A->getType()->isVectorTy()) {
8247 // Vectors are naturally aligned.
8248 ArgAlign = Align(ArgSize);
8249 }
8250 if (ArgAlign < 8)
8251 ArgAlign = Align(8);
8252 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8253 if (DL.isBigEndian()) {
8254 // Adjusting the shadow for argument with size < 8 to match the
8255 // placement of bits in big endian system
8256 if (ArgSize < 8)
8257 VAArgOffset += (8 - ArgSize);
8258 }
8259 if (!IsFixed) {
8260 Base =
8261 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8262 if (Base)
8263 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8264 }
8265 VAArgOffset += ArgSize;
8266 VAArgOffset = alignTo(VAArgOffset, Align(8));
8267 }
8268 if (IsFixed)
8269 VAArgBase = VAArgOffset;
8270 }
8271
8272 Constant *TotalVAArgSize =
8273 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8274 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8275 // a new class member i.e. it is the total size of all VarArgs.
8276 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8277 }
8278
8279 void finalizeInstrumentation() override {
8280 assert(!VAArgSize && !VAArgTLSCopy &&
8281 "finalizeInstrumentation called twice");
8282 IRBuilder<> IRB(MSV.FnPrologueEnd);
8283 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8284 Value *CopySize = VAArgSize;
8285
8286 if (!VAStartInstrumentationList.empty()) {
8287 // If there is a va_start in this function, make a backup copy of
8288 // va_arg_tls somewhere in the function entry block.
8289
8290 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8291 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8292 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8293 CopySize, kShadowTLSAlignment, false);
8294
8295 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8296 Intrinsic::umin, CopySize,
8297 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8298 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8299 kShadowTLSAlignment, SrcSize);
8300 }
8301
8302 // Instrument va_start.
8303 // Copy va_list shadow from the backup copy of the TLS contents.
8304 for (CallInst *OrigInst : VAStartInstrumentationList) {
8305 NextNodeIRBuilder IRB(OrigInst);
8306 Value *VAListTag = OrigInst->getArgOperand(0);
8307 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8308
8309 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8310
8311 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8312 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8313 const DataLayout &DL = F.getDataLayout();
8314 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8315 const Align Alignment = Align(IntptrSize);
8316 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8317 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8318 Alignment, /*isStore*/ true);
8319 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8320 CopySize);
8321 }
8322 }
8323};
8324
8325/// PowerPC32-specific implementation of VarArgHelper.
8326struct VarArgPowerPC32Helper : public VarArgHelperBase {
8327 AllocaInst *VAArgTLSCopy = nullptr;
8328 Value *VAArgSize = nullptr;
8329
8330 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8331 MemorySanitizerVisitor &MSV)
8332 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8333
8334 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8335 unsigned VAArgBase;
8336 // Parameter save area is 8 bytes from frame pointer in PPC32
8337 VAArgBase = 8;
8338 unsigned VAArgOffset = VAArgBase;
8339 const DataLayout &DL = F.getDataLayout();
8340 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8341 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8342 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8343 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8344 if (IsByVal) {
8345 assert(A->getType()->isPointerTy());
8346 Type *RealTy = CB.getParamByValType(ArgNo);
8347 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8348 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8349 if (ArgAlign < IntptrSize)
8350 ArgAlign = Align(IntptrSize);
8351 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8352 if (!IsFixed) {
8353 Value *Base =
8354 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8355 if (Base) {
8356 Value *AShadowPtr, *AOriginPtr;
8357 std::tie(AShadowPtr, AOriginPtr) =
8358 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8359 kShadowTLSAlignment, /*isStore*/ false);
8360
8361 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8362 kShadowTLSAlignment, ArgSize);
8363 }
8364 }
8365 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8366 } else {
8367 Value *Base;
8368 Type *ArgTy = A->getType();
8369
8370 // On PPC 32 floating point variable arguments are stored in separate
8371 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8372 // them as they will be found when checking call arguments.
8373 if (!ArgTy->isFloatingPointTy()) {
8374 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8375 Align ArgAlign = Align(IntptrSize);
8376 if (ArgTy->isArrayTy()) {
8377 // Arrays are aligned to element size, except for long double
8378 // arrays, which are aligned to 8 bytes.
8379 Type *ElementTy = ArgTy->getArrayElementType();
8380 if (!ElementTy->isPPC_FP128Ty())
8381 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8382 } else if (ArgTy->isVectorTy()) {
8383 // Vectors are naturally aligned.
8384 ArgAlign = Align(ArgSize);
8385 }
8386 if (ArgAlign < IntptrSize)
8387 ArgAlign = Align(IntptrSize);
8388 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8389 if (DL.isBigEndian()) {
8390 // Adjusting the shadow for argument with size < IntptrSize to match
8391 // the placement of bits in big endian system
8392 if (ArgSize < IntptrSize)
8393 VAArgOffset += (IntptrSize - ArgSize);
8394 }
8395 if (!IsFixed) {
8396 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8397 ArgSize);
8398 if (Base)
8399 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8401 }
8402 VAArgOffset += ArgSize;
8403 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8404 }
8405 }
8406 }
8407
8408 Constant *TotalVAArgSize =
8409 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8410 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8411 // a new class member i.e. it is the total size of all VarArgs.
8412 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8413 }
8414
8415 void finalizeInstrumentation() override {
8416 assert(!VAArgSize && !VAArgTLSCopy &&
8417 "finalizeInstrumentation called twice");
8418 IRBuilder<> IRB(MSV.FnPrologueEnd);
8419 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8420 Value *CopySize = VAArgSize;
8421
8422 if (!VAStartInstrumentationList.empty()) {
8423 // If there is a va_start in this function, make a backup copy of
8424 // va_arg_tls somewhere in the function entry block.
8425
8426 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8427 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8428 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8429 CopySize, kShadowTLSAlignment, false);
8430
8431 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8432 Intrinsic::umin, CopySize,
8433 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8434 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8435 kShadowTLSAlignment, SrcSize);
8436 }
8437
8438 // Instrument va_start.
8439 // Copy va_list shadow from the backup copy of the TLS contents.
8440 for (CallInst *OrigInst : VAStartInstrumentationList) {
8441 NextNodeIRBuilder IRB(OrigInst);
8442 Value *VAListTag = OrigInst->getArgOperand(0);
8443 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8444 Value *RegSaveAreaSize = CopySize;
8445
8446 // In PPC32 va_list_tag is a struct
8447 RegSaveAreaPtrPtr =
8448 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8449
8450 // On PPC 32 reg_save_area can only hold 32 bytes of data
8451 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8452 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8453
8454 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8455 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8456
8457 const DataLayout &DL = F.getDataLayout();
8458 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8459 const Align Alignment = Align(IntptrSize);
8460
8461 { // Copy reg save area
8462 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8463 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8464 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8465 Alignment, /*isStore*/ true);
8466 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8467 Alignment, RegSaveAreaSize);
8468
8469 RegSaveAreaShadowPtr =
8470 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8471 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8472 ConstantInt::get(MS.IntptrTy, 32));
8473 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8474 // We fill fp shadow with zeroes as uninitialized fp args should have
8475 // been found during call base check
8476 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8477 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8478 }
8479
8480 { // Copy overflow area
8481 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8482 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8483
8484 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8485 OverflowAreaPtrPtr =
8486 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8487 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8488
8489 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8490
8491 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8492 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8493 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8494 Alignment, /*isStore*/ true);
8495
8496 Value *OverflowVAArgTLSCopyPtr =
8497 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8498 OverflowVAArgTLSCopyPtr =
8499 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8500
8501 OverflowVAArgTLSCopyPtr =
8502 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8503 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8504 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8505 }
8506 }
8507 }
8508};
8509
8510/// SystemZ-specific implementation of VarArgHelper.
8511struct VarArgSystemZHelper : public VarArgHelperBase {
8512 static const unsigned SystemZGpOffset = 16;
8513 static const unsigned SystemZGpEndOffset = 56;
8514 static const unsigned SystemZFpOffset = 128;
8515 static const unsigned SystemZFpEndOffset = 160;
8516 static const unsigned SystemZMaxVrArgs = 8;
8517 static const unsigned SystemZRegSaveAreaSize = 160;
8518 static const unsigned SystemZOverflowOffset = 160;
8519 static const unsigned SystemZVAListTagSize = 32;
8520 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8521 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8522
8523 bool IsSoftFloatABI;
8524 AllocaInst *VAArgTLSCopy = nullptr;
8525 AllocaInst *VAArgTLSOriginCopy = nullptr;
8526 Value *VAArgOverflowSize = nullptr;
8527
8528 enum class ArgKind {
8529 GeneralPurpose,
8530 FloatingPoint,
8531 Vector,
8532 Memory,
8533 Indirect,
8534 };
8535
8536 enum class ShadowExtension { None, Zero, Sign };
8537
8538 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8539 MemorySanitizerVisitor &MSV)
8540 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8541 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8542
8543 ArgKind classifyArgument(Type *T) {
8544 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8545 // only a few possibilities of what it can be. In particular, enums, single
8546 // element structs and large types have already been taken care of.
8547
8548 // Some i128 and fp128 arguments are converted to pointers only in the
8549 // back end.
8550 if (T->isIntegerTy(128) || T->isFP128Ty())
8551 return ArgKind::Indirect;
8552 if (T->isFloatingPointTy())
8553 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8554 if (T->isIntegerTy() || T->isPointerTy())
8555 return ArgKind::GeneralPurpose;
8556 if (T->isVectorTy())
8557 return ArgKind::Vector;
8558 return ArgKind::Memory;
8559 }
8560
8561 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8562 // ABI says: "One of the simple integer types no more than 64 bits wide.
8563 // ... If such an argument is shorter than 64 bits, replace it by a full
8564 // 64-bit integer representing the same number, using sign or zero
8565 // extension". Shadow for an integer argument has the same type as the
8566 // argument itself, so it can be sign or zero extended as well.
8567 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8568 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8569 if (ZExt) {
8570 assert(!SExt);
8571 return ShadowExtension::Zero;
8572 }
8573 if (SExt) {
8574 assert(!ZExt);
8575 return ShadowExtension::Sign;
8576 }
8577 return ShadowExtension::None;
8578 }
8579
8580 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8581 unsigned GpOffset = SystemZGpOffset;
8582 unsigned FpOffset = SystemZFpOffset;
8583 unsigned VrIndex = 0;
8584 unsigned OverflowOffset = SystemZOverflowOffset;
8585 const DataLayout &DL = F.getDataLayout();
8586 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8587 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8588 // SystemZABIInfo does not produce ByVal parameters.
8589 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8590 Type *T = A->getType();
8591 ArgKind AK = classifyArgument(T);
8592 if (AK == ArgKind::Indirect) {
8593 T = MS.PtrTy;
8594 AK = ArgKind::GeneralPurpose;
8595 }
8596 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8597 AK = ArgKind::Memory;
8598 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8599 AK = ArgKind::Memory;
8600 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8601 AK = ArgKind::Memory;
8602 Value *ShadowBase = nullptr;
8603 Value *OriginBase = nullptr;
8604 ShadowExtension SE = ShadowExtension::None;
8605 switch (AK) {
8606 case ArgKind::GeneralPurpose: {
8607 // Always keep track of GpOffset, but store shadow only for varargs.
8608 uint64_t ArgSize = 8;
8609 if (GpOffset + ArgSize <= kParamTLSSize) {
8610 if (!IsFixed) {
8611 SE = getShadowExtension(CB, ArgNo);
8612 uint64_t GapSize = 0;
8613 if (SE == ShadowExtension::None) {
8614 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8615 assert(ArgAllocSize <= ArgSize);
8616 GapSize = ArgSize - ArgAllocSize;
8617 }
8618 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8619 if (MS.TrackOrigins)
8620 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8621 }
8622 GpOffset += ArgSize;
8623 } else {
8624 GpOffset = kParamTLSSize;
8625 }
8626 break;
8627 }
8628 case ArgKind::FloatingPoint: {
8629 // Always keep track of FpOffset, but store shadow only for varargs.
8630 uint64_t ArgSize = 8;
8631 if (FpOffset + ArgSize <= kParamTLSSize) {
8632 if (!IsFixed) {
8633 // PoP says: "A short floating-point datum requires only the
8634 // left-most 32 bit positions of a floating-point register".
8635 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8636 // don't extend shadow and don't mind the gap.
8637 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8638 if (MS.TrackOrigins)
8639 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8640 }
8641 FpOffset += ArgSize;
8642 } else {
8643 FpOffset = kParamTLSSize;
8644 }
8645 break;
8646 }
8647 case ArgKind::Vector: {
8648 // Keep track of VrIndex. No need to store shadow, since vector varargs
8649 // go through AK_Memory.
8650 assert(IsFixed);
8651 VrIndex++;
8652 break;
8653 }
8654 case ArgKind::Memory: {
8655 // Keep track of OverflowOffset and store shadow only for varargs.
8656 // Ignore fixed args, since we need to copy only the vararg portion of
8657 // the overflow area shadow.
8658 if (!IsFixed) {
8659 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8660 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8661 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8662 SE = getShadowExtension(CB, ArgNo);
8663 uint64_t GapSize =
8664 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8665 ShadowBase =
8666 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8667 if (MS.TrackOrigins)
8668 OriginBase =
8669 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8670 OverflowOffset += ArgSize;
8671 } else {
8672 OverflowOffset = kParamTLSSize;
8673 }
8674 }
8675 break;
8676 }
8677 case ArgKind::Indirect:
8678 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8679 }
8680 if (ShadowBase == nullptr)
8681 continue;
8682 Value *Shadow = MSV.getShadow(A);
8683 if (SE != ShadowExtension::None)
8684 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8685 /*Signed*/ SE == ShadowExtension::Sign);
8686 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8687 IRB.CreateStore(Shadow, ShadowBase);
8688 if (MS.TrackOrigins) {
8689 Value *Origin = MSV.getOrigin(A);
8690 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8691 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8693 }
8694 }
8695 Constant *OverflowSize = ConstantInt::get(
8696 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8697 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8698 }
8699
8700 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8701 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8702 IRB.CreateAdd(
8703 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8704 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8705 MS.PtrTy);
8706 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8707 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8708 const Align Alignment = Align(8);
8709 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8710 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8711 /*isStore*/ true);
8712 // TODO(iii): copy only fragments filled by visitCallBase()
8713 // TODO(iii): support packed-stack && !use-soft-float
8714 // For use-soft-float functions, it is enough to copy just the GPRs.
8715 unsigned RegSaveAreaSize =
8716 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8717 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8718 RegSaveAreaSize);
8719 if (MS.TrackOrigins)
8720 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8721 Alignment, RegSaveAreaSize);
8722 }
8723
8724 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8725 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8726 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8727 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8728 IRB.CreateAdd(
8729 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8730 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8731 MS.PtrTy);
8732 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8733 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8734 const Align Alignment = Align(8);
8735 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8736 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8737 Alignment, /*isStore*/ true);
8738 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8739 SystemZOverflowOffset);
8740 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8741 VAArgOverflowSize);
8742 if (MS.TrackOrigins) {
8743 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8744 SystemZOverflowOffset);
8745 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8746 VAArgOverflowSize);
8747 }
8748 }
8749
8750 void finalizeInstrumentation() override {
8751 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8752 "finalizeInstrumentation called twice");
8753 if (!VAStartInstrumentationList.empty()) {
8754 // If there is a va_start in this function, make a backup copy of
8755 // va_arg_tls somewhere in the function entry block.
8756 IRBuilder<> IRB(MSV.FnPrologueEnd);
8757 VAArgOverflowSize =
8758 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8759 Value *CopySize =
8760 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8761 VAArgOverflowSize);
8762 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8763 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8764 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8765 CopySize, kShadowTLSAlignment, false);
8766
8767 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8768 Intrinsic::umin, CopySize,
8769 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8770 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8771 kShadowTLSAlignment, SrcSize);
8772 if (MS.TrackOrigins) {
8773 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8774 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8775 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8776 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8777 }
8778 }
8779
8780 // Instrument va_start.
8781 // Copy va_list shadow from the backup copy of the TLS contents.
8782 for (CallInst *OrigInst : VAStartInstrumentationList) {
8783 NextNodeIRBuilder IRB(OrigInst);
8784 Value *VAListTag = OrigInst->getArgOperand(0);
8785 copyRegSaveArea(IRB, VAListTag);
8786 copyOverflowArea(IRB, VAListTag);
8787 }
8788 }
8789};
8790
8791/// i386-specific implementation of VarArgHelper.
8792struct VarArgI386Helper : public VarArgHelperBase {
8793 AllocaInst *VAArgTLSCopy = nullptr;
8794 Value *VAArgSize = nullptr;
8795
8796 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8797 MemorySanitizerVisitor &MSV)
8798 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8799
8800 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8801 const DataLayout &DL = F.getDataLayout();
8802 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8803 unsigned VAArgOffset = 0;
8804 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8805 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8806 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8807 if (IsByVal) {
8808 assert(A->getType()->isPointerTy());
8809 Type *RealTy = CB.getParamByValType(ArgNo);
8810 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8811 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8812 if (ArgAlign < IntptrSize)
8813 ArgAlign = Align(IntptrSize);
8814 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8815 if (!IsFixed) {
8816 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8817 if (Base) {
8818 Value *AShadowPtr, *AOriginPtr;
8819 std::tie(AShadowPtr, AOriginPtr) =
8820 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8821 kShadowTLSAlignment, /*isStore*/ false);
8822
8823 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8824 kShadowTLSAlignment, ArgSize);
8825 }
8826 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8827 }
8828 } else {
8829 Value *Base;
8830 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8831 Align ArgAlign = Align(IntptrSize);
8832 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8833 if (DL.isBigEndian()) {
8834 // Adjusting the shadow for argument with size < IntptrSize to match
8835 // the placement of bits in big endian system
8836 if (ArgSize < IntptrSize)
8837 VAArgOffset += (IntptrSize - ArgSize);
8838 }
8839 if (!IsFixed) {
8840 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8841 if (Base)
8842 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8843 VAArgOffset += ArgSize;
8844 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8845 }
8846 }
8847 }
8848
8849 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8850 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8851 // a new class member i.e. it is the total size of all VarArgs.
8852 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8853 }
8854
8855 void finalizeInstrumentation() override {
8856 assert(!VAArgSize && !VAArgTLSCopy &&
8857 "finalizeInstrumentation called twice");
8858 IRBuilder<> IRB(MSV.FnPrologueEnd);
8859 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8860 Value *CopySize = VAArgSize;
8861
8862 if (!VAStartInstrumentationList.empty()) {
8863 // If there is a va_start in this function, make a backup copy of
8864 // va_arg_tls somewhere in the function entry block.
8865 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8866 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8867 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8868 CopySize, kShadowTLSAlignment, false);
8869
8870 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8871 Intrinsic::umin, CopySize,
8872 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8873 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8874 kShadowTLSAlignment, SrcSize);
8875 }
8876
8877 // Instrument va_start.
8878 // Copy va_list shadow from the backup copy of the TLS contents.
8879 for (CallInst *OrigInst : VAStartInstrumentationList) {
8880 NextNodeIRBuilder IRB(OrigInst);
8881 Value *VAListTag = OrigInst->getArgOperand(0);
8882 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8883 Value *RegSaveAreaPtrPtr =
8884 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8885 PointerType::get(*MS.C, 0));
8886 Value *RegSaveAreaPtr =
8887 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8888 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8889 const DataLayout &DL = F.getDataLayout();
8890 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8891 const Align Alignment = Align(IntptrSize);
8892 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8893 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8894 Alignment, /*isStore*/ true);
8895 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8896 CopySize);
8897 }
8898 }
8899};
8900
8901/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8902/// LoongArch64.
8903struct VarArgGenericHelper : public VarArgHelperBase {
8904 AllocaInst *VAArgTLSCopy = nullptr;
8905 Value *VAArgSize = nullptr;
8906
8907 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8908 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8909 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8910
8911 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8912 unsigned VAArgOffset = 0;
8913 const DataLayout &DL = F.getDataLayout();
8914 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8915 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8916 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8917 if (IsFixed)
8918 continue;
8919 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8920 if (DL.isBigEndian()) {
8921 // Adjusting the shadow for argument with size < IntptrSize to match the
8922 // placement of bits in big endian system
8923 if (ArgSize < IntptrSize)
8924 VAArgOffset += (IntptrSize - ArgSize);
8925 }
8926 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8927 VAArgOffset += ArgSize;
8928 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8929 if (!Base)
8930 continue;
8931 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8932 }
8933
8934 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8935 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8936 // a new class member i.e. it is the total size of all VarArgs.
8937 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8938 }
8939
8940 void finalizeInstrumentation() override {
8941 assert(!VAArgSize && !VAArgTLSCopy &&
8942 "finalizeInstrumentation called twice");
8943 IRBuilder<> IRB(MSV.FnPrologueEnd);
8944 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8945 Value *CopySize = VAArgSize;
8946
8947 if (!VAStartInstrumentationList.empty()) {
8948 // If there is a va_start in this function, make a backup copy of
8949 // va_arg_tls somewhere in the function entry block.
8950 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8951 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8952 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8953 CopySize, kShadowTLSAlignment, false);
8954
8955 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8956 Intrinsic::umin, CopySize,
8957 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8958 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8959 kShadowTLSAlignment, SrcSize);
8960 }
8961
8962 // Instrument va_start.
8963 // Copy va_list shadow from the backup copy of the TLS contents.
8964 for (CallInst *OrigInst : VAStartInstrumentationList) {
8965 NextNodeIRBuilder IRB(OrigInst);
8966 Value *VAListTag = OrigInst->getArgOperand(0);
8967 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8968 Value *RegSaveAreaPtrPtr =
8969 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8970 PointerType::get(*MS.C, 0));
8971 Value *RegSaveAreaPtr =
8972 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8973 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8974 const DataLayout &DL = F.getDataLayout();
8975 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8976 const Align Alignment = Align(IntptrSize);
8977 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8978 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8979 Alignment, /*isStore*/ true);
8980 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8981 CopySize);
8982 }
8983 }
8984};
8985
8986// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8987// regarding VAArgs.
8988using VarArgARM32Helper = VarArgGenericHelper;
8989using VarArgRISCVHelper = VarArgGenericHelper;
8990using VarArgMIPSHelper = VarArgGenericHelper;
8991using VarArgLoongArch64Helper = VarArgGenericHelper;
8992
8993/// A no-op implementation of VarArgHelper.
8994struct VarArgNoOpHelper : public VarArgHelper {
8995 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8996 MemorySanitizerVisitor &MSV) {}
8997
8998 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8999
9000 void visitVAStartInst(VAStartInst &I) override {}
9001
9002 void visitVACopyInst(VACopyInst &I) override {}
9003
9004 void finalizeInstrumentation() override {}
9005};
9006
9007} // end anonymous namespace
9008
9009static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
9010 MemorySanitizerVisitor &Visitor) {
9011 // VarArg handling is only implemented on AMD64. False positives are possible
9012 // on other platforms.
9013 Triple TargetTriple(Func.getParent()->getTargetTriple());
9014
9015 if (TargetTriple.getArch() == Triple::x86)
9016 return new VarArgI386Helper(Func, Msan, Visitor);
9017
9018 if (TargetTriple.getArch() == Triple::x86_64)
9019 return new VarArgAMD64Helper(Func, Msan, Visitor);
9020
9021 if (TargetTriple.isARM())
9022 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9023
9024 if (TargetTriple.isAArch64())
9025 return new VarArgAArch64Helper(Func, Msan, Visitor);
9026
9027 if (TargetTriple.isSystemZ())
9028 return new VarArgSystemZHelper(Func, Msan, Visitor);
9029
9030 // On PowerPC32 VAListTag is a struct
9031 // {char, char, i16 padding, char *, char *}
9032 if (TargetTriple.isPPC32())
9033 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
9034
9035 if (TargetTriple.isPPC64())
9036 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
9037
9038 if (TargetTriple.isRISCV32())
9039 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9040
9041 if (TargetTriple.isRISCV64())
9042 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9043
9044 if (TargetTriple.isMIPS32())
9045 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9046
9047 if (TargetTriple.isMIPS64())
9048 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9049
9050 if (TargetTriple.isLoongArch64())
9051 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
9052 /*VAListTagSize=*/8);
9053
9054 return new VarArgNoOpHelper(Func, Msan, Visitor);
9055}
9056
9057bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
9058 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
9059 return false;
9060
9061 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
9062 return false;
9063
9064 MemorySanitizerVisitor Visitor(F, *this, TLI);
9065
9066 // Clear out memory attributes.
9068 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
9069 F.removeFnAttrs(B);
9070
9071 return Visitor.runOnFunction();
9072}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_S390X_MemoryMapParams
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
OddOrEvenLanes
@ kOddLanes
@ kEvenLanes
@ kBothLanes
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:145
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:483
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=false)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:135
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:802
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:215
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2555
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1958
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1838
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2609
LLVM_ABI Value * CreateAllocationSize(Type *DestTy, AllocaInst *AI)
Get allocation size of an alloca as a runtime Value* (handles both static and dynamic allocas and vsc...
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2543
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1872
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2224
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2602
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2067
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2172
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1517
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2026
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1458
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2306
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1945
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1789
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2467
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1813
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2302
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1424
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2177
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1855
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1496
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2055
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2577
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1555
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1868
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1407
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2167
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2635
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2481
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2041
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1712
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2334
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2314
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2250
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2630
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1891
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2031
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1536
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1603
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2412
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1577
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1441
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2776
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:318
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:413
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1066
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1109
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1082
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:418
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1114
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1055
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1061
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:943
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1087
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1034
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1133
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:61
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:280
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:197
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:230
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:207
unsigned getNumOperands() const
Definition User.h:229
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:397
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
Definition Types.h:26
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:532
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1667
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2544
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3890
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2896
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70