AddressSanitizer From the Inside: Shadow Memory, Red Zones, and Poison || Packed Bits

If you’ve ever written C or C++ and compiled with -fsanitize=address, you’ve used AddressSanitizer (ASan). Compile, run, and it catches buffer overflows, use-after-free, use-after-scope, double-free, and a handful of other bugs, printing a diagnostic that tells you exactly which bytes were accessed and from where. The runtime overhead is roughly 2x, which is remarkable for what it gives you.

The magic happens in two halves: a compiler pass that instruments every load, store, and alloca, and a runtime library (part of compiler-rt) that maintains shadow memory and prints the reports. This post is about the compiler pass — what IR it rewrites and why.

All line numbers refer to llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp. Test cases live under llvm/test/Instrumentation/AddressSanitizer/.

The big idea: shadow memory

ASan’s central trick is shadow memory. For every 8 bytes of your program’s addressable memory, ASan keeps 1 byte of shadow state elsewhere, recording “how many of these 8 bytes are currently valid to access”.

The mapping is trivial to compute:

shadow_address = (app_address >> 3) + ShadowOffset

Shift the address right by 3 (divide by 8), add a per-platform offset, and you’re at the shadow byte. The offset is chosen so that the shadow region lands in unused virtual address space. On Linux x86_64 it’s 0x7fff8000; on AArch64 it’s 1 << 36; and so on (constants in AddressSanitizer.cpp:99-127).

The shadow byte’s value encodes whether the corresponding 8 application bytes are accessible:

0x00 — all 8 bytes are accessible.
0x01 to 0x07 — only the first N bytes are accessible (partial access at the tail, for allocations whose size isn’t a multiple of 8).
0xF5, 0xFA, 0xFB, 0xFD, … — distinct “poisoned” values, each encoding a reason the memory isn’t accessible (red zone, freed, use-after-scope, use-after-return, etc.). The compiler only cares that these are non-zero; the runtime uses the specific byte to generate a more informative error report.

Every instrumented load and store checks its shadow byte before accessing memory. If non-zero, ASan calls into the runtime to report the bug.

Instrumenting loads and stores

The entry point for load/store instrumentation is instrumentMop (AddressSanitizer.cpp:1778-1852), which delegates to instrumentAddress (AddressSanitizer.cpp:1947-2027). In pseudocode, the IR it emits looks like this:

; Original:
%tmp1 = load i32, ptr %a, align 4

; After instrumentation:
%0 = ptrtoint ptr %a to i64
%1 = lshr i64 %0, 3           ; divide by 8
%2 = add i64 %1, 2147450880   ; add shadow offset
%3 = inttoptr i64 %2 to ptr
%4 = load i8, ptr %3, align 1 ; load shadow byte
%5 = icmp ne i8 %4, 0         ; fast-path check
br i1 %5, label %crash_block, label %ok, !prof !0

crash_block:
  ; slow-path refinement: is our specific access range actually poisoned?
  %7 = and i64 %0, 7           ; byte offset within 8-byte chunk
  %8 = add i64 %7, 3           ; last accessed byte (size - 1 = 3 for i32)
  %9 = trunc i64 %8 to i8
  %10 = icmp sge i8 %9, %4
  br i1 %10, label %report, label %ok

report:
  call void @__asan_report_load4(i64 %0)
  unreachable

ok:
  %tmp1 = load i32, ptr %a, align 4

What’s going on:

Shadow address computation (memToShadow, AddressSanitizer.cpp:1405-1419). The app address is converted to an integer, shifted right by Mapping.Scale (3 by default), and added to Mapping.Offset. On some platforms an OR is used instead of ADD, if the offset is a power-of-two aligned with the shift.
Fast path. Load the shadow byte and compare to zero. If the shadow is entirely zero, all 8 bytes are addressable and we proceed directly to the actual load. This is the common case — most memory is valid — and it’s fast: one extra load, one compare, one branch.
Slow path refinement. If the shadow is non-zero, we need to check whether our specific access overlaps a poisoned byte. The shadow byte might say “only first 5 bytes addressable” (value 5), and if our access is the first 4, we’re fine. The refinement logic (createSlowPathCmp, AddressSanitizer.cpp:1883-1899) computes the offset of our access within the 8-byte chunk and compares against the shadow value.
Report. If the refinement confirms a real violation, call the appropriate runtime function. There’s a specialized reporter per access size (__asan_report_load1, __asan_report_load2, __asan_report_load4, __asan_report_load8) and a generic one for odd sizes (__asan_report_load_n).

The branch weight metadata (!0 in the example, emitted at AddressSanitizer.cpp:2009) tells the CPU branch predictor that the fast path is overwhelmingly likely — the report path happens roughly one time in 2²⁰. Without this, the predictor would speculate badly and performance would tank.

Why 8:1 and not 1:1

A naive implementation would use one shadow bit per application byte. Why 1 byte of shadow per 8 bytes of app memory?

A byte is the smallest unit most architectures can load in one instruction. A single shadow byte per 8 app bytes means the instrumentation is a byte-load, not a bit extraction. It’s faster to check. The trade-off is that the shadow can only say “addressable or not at a granularity of 8 bytes”, which would be too coarse — you’d miss small out-of-bounds accesses. That’s why the shadow also encodes “N bytes addressable”, so sub-8-byte granularity is preserved for the boundary case.

This elegant encoding — one byte of shadow, integer values 0..7 mean “first N are OK”, larger values are poison — is what lets the fast path stay trivial while still catching byte-level errors.

Instrumenting the stack

Stack instrumentation is done by FunctionStackPoisoner (AddressSanitizer.cpp:1063-1290). Its job: pack all the function’s allocas into one contiguous frame, insert red zones between them, and poison those red zones so any overflow between variables is caught.

The input might be:

%a = alloca [10 x i8]
%b = alloca [20 x i8]
%c = alloca [30 x i8]

After instrumentation:

; Single packed frame for everything, aligned to 32 bytes (red zone size)
%MyAlloca = alloca i8, i64 192, align 32

; Layout inside MyAlloca:
;   [ 32 bytes header (metadata)       ]
;   [ 10 bytes data for %a             ]
;   [ 22 bytes RED ZONE (poisoned)     ]
;   [ 20 bytes data for %b             ]
;   [ 12 bytes RED ZONE                ]
;   [ 30 bytes data for %c             ]
;   [ 30 bytes RED ZONE                ]
; = 32 + 10 + 22 + 20 + 12 + 30 + 30 = 156 (rounded up)

; At function entry: poison shadow for red zones
store i64 <poison pattern>, ptr %shadow_ptr

; Replace each original alloca with a GEP into MyAlloca
%a = getelementptr inbounds i8, ptr %MyAlloca, i32 32
%b = getelementptr inbounds i8, ptr %MyAlloca, i32 64
%c = getelementptr inbounds i8, ptr %MyAlloca, i32 96

; At function exit: unpoison
store i64 0, ptr %shadow_ptr
ret void

The red zone size constant is kAllocaRzSize = 32 (AddressSanitizer.cpp:187). Every original alloca is padded out to a multiple of this, and any walk-off-the-end of a variable lands immediately in a poisoned red zone, where the runtime catches it.

The packing into one alloca is done by ComputeASanStackFrameLayout (called at AddressSanitizer.cpp:3579). The function that writes the poison/unpoison pattern is poisonAlloca (AddressSanitizer.cpp:3812-3820):

void FunctionStackPoisoner::poisonAlloca(Value *V, uint64_t Size,
                                         IRBuilder<> &IRB, bool DoPoison) {
  Value *AddrArg = IRB.CreatePointerCast(V, IntptrTy);
  Value *SizeArg = ConstantInt::get(IntptrTy, Size);
  RTCI.createRuntimeCall(
      IRB, DoPoison ? AsanPoisonStackMemoryFunc : AsanUnpoisonStackMemoryFunc,
      {AddrArg, SizeArg});
}

Poisoning writes the marker pattern to shadow memory; unpoisoning writes zero. Both go through the runtime so the runtime can keep its own bookkeeping in sync.

Lifetime markers and use-after-scope

ASan also listens for llvm.lifetime.start and llvm.lifetime.end intrinsics. These are emitted by Clang when a variable enters or leaves its lexical scope:

{
  int a;           // llvm.lifetime.start(%a)
  use(a);
  // llvm.lifetime.end(%a)
}
use(a);            // use after scope!

On the lifetime.start, ASan unpoisons the variable’s bytes. On the lifetime.end, it poisons them again. Any load or store to the variable outside its scope hits poisoned shadow and gets reported. This is how ASan catches use-after-scope bugs.

Globals

Global variables are handled by ModuleAddressSanitizer::instrumentGlobals (AddressSanitizer.cpp:990), which runs once per module. Each global gets padded with a red zone, the compiler registers metadata with the runtime at startup, and the runtime marks the red zones as poisoned.

The registration happens via module constructors:

define internal void @asan.module_ctor() {
  call void @__asan_init()
  call void @__asan_register_elf_globals(i64 ptrtoint (...), ...)
}

define internal void @asan.module_dtor() {
  call void @__asan_unregister_elf_globals(...)
}

ELF, Mach-O, and COFF each have slightly different machinery for this, with platform-specific sections and metadata formats. The principle is the same: the compiler emits a table, the constructor tells the runtime about it, and the runtime poisons the red zones.

The runtime interface

ASan’s runtime functions all start with __asan_. The compiler pass emits calls to them; the linker resolves against compiler-rt’s ASan implementation. The most common ones:

Load/store checks (fast-path callbacks, when -fsanitize-address-use-after-scope-callback=1):

__asan_load1, __asan_load2, __asan_load4, __asan_load8
__asan_store1 through __asan_store8
__asan_loadN and __asan_storeN (generic, takes size as an argument)

Error reporting (slow path):

__asan_report_load1 through __asan_report_load8
__asan_report_store1 through __asan_report_store8
__asan_report_load_n, __asan_report_store_n

Stack lifetime:

__asan_poison_stack_memory(addr, size)
__asan_unpoison_stack_memory(addr, size)
__asan_stack_malloc_N(size) — for use-after-return detection with a fake-frame allocator
__asan_stack_free_N(addr, size)

Globals:

__asan_register_elf_globals(...)
__asan_unregister_elf_globals(...)

Miscellaneous:

__asan_init() — initialization
__asan_handle_no_return() — called before longjmp, throw, etc., to clear fake-frame state
__asan_memcpy, __asan_memmove, __asan_memset — instrumented replacements for the standard library

The prefix is configurable via -asan-memory-access-callback-prefix, but the default __asan_ is what every real ASan build uses.

A complete worked example

Starting IR:

define i32 @test_load(ptr %a) sanitize_address {
entry:
  %tmp1 = load i32, ptr %a, align 4
  ret i32 %tmp1
}

After running opt -passes=asan:

define i32 @test_load(ptr %a) #0 {
entry:
  %0 = ptrtoint ptr %a to i64
  %1 = lshr i64 %0, 3
  %2 = add i64 %1, 2147450880
  %3 = inttoptr i64 %2 to ptr
  %4 = load i8, ptr %3, align 1
  %5 = icmp ne i8 %4, 0
  br i1 %5, label %6, label %12, !prof !0

6:
  %7 = and i64 %0, 7
  %8 = add i64 %7, 3
  %9 = trunc i64 %8 to i8
  %10 = icmp sge i8 %9, %4
  br i1 %10, label %11, label %12

11:
  call void @__asan_report_load4(i64 %0) #4
  unreachable

12:
  %tmp1 = load i32, ptr %a, align 4
  ret i32 %tmp1
}

attributes #0 = { sanitize_address }
!0 = !{!"branch_weights", i32 1, i32 1048575}

A single 4-byte load became roughly 10 new instructions plus a basic block split. The overhead is nontrivial per-load, but:

The fast path (shadow is zero) is just 3 extra ops before the branch.
The slow path is only taken when the fast path says “something’s off” — one in a million.
The branch weight annotation keeps the CPU pipelined correctly.

Empirically, ASan-enabled code runs at 40-60% of the native speed, which is remarkable given how many checks are being inserted.

Why you can’t turn this off selectively

A question I see occasionally: “can we instrument only the loads that could fail?”. The answer is no, at least not from within an optimization pass. Once instrumentation has run, the load-store pattern is opaque to later passes; we can’t un-instrument specific accesses. There is an optimization in ASan that skips instrumenting provably-safe accesses (isSafeAccess, see instrumentMop) — typically accesses to locals of known size with in-bounds constant indices. That’s a compile-time skip based on what the compiler can prove. But every other access gets instrumented, because the whole point of the sanitizer is to catch bugs the compiler couldn’t have foreseen.

What ASan doesn’t catch

For completeness: ASan’s shadow model is great for spatial errors (buffer overflow, use-after-free, use-after-scope) and double-free, but it doesn’t catch:

Data races — that’s ThreadSanitizer’s job.
Uninitialized reads — MemorySanitizer.
Undefined behavior like signed overflow — UndefinedBehaviorSanitizer.
Leaks — LeakSanitizer (which is bundled with ASan by default).

Each sanitizer has its own instrumentation pass; they share some infrastructure (the Instrumentation directory) but are otherwise independent.

Closing

AddressSanitizer is, to me, one of the most beautiful pieces of engineering in modern toolchains. The shadow-memory idea is straightforward in the abstract, but every detail — the 8:1 mapping, the byte encoding of partial validity, the fast-path/slow-path branching, the stack frame packing, the lifetime-marker integration — is the result of thinking very carefully about performance, correctness, and debuggability. And it’s all right there in one file, plus a runtime library, plus about 20 years of bug fixes.

If you’re tempted to read the source: start at instrumentMop (AddressSanitizer.cpp:1778) and follow the call chain through instrumentAddress and memToShadow. Then skim FunctionStackPoisoner::runOnFunction (AddressSanitizer.cpp:1111) to see the stack side. The test directory llvm/test/Instrumentation/AddressSanitizer/basic.ll gives you an “expected output” to check your understanding against.

AddressSanitizer From the Inside: Shadow Memory, Red Zones, and Poison