Three of my recent posts have leaned heavily on MemorySSA — the TBAA walkthrough used it implicitly, and the MemCpyOpt and GVN posts mention it on almost every page. This post is the dedicated one. If you want to understand why modern LLVM passes can cheaply answer questions like “does any store between this call and that load write to the same memory?”, you want to understand MemorySSA.

All line numbers here refer to llvm/lib/Analysis/MemorySSA.cpp and llvm/include/llvm/Analysis/MemorySSA.h. Test cases come from llvm/test/Analysis/MemorySSA/.

The problem MemorySSA solves

Regular SSA gives you a dataflow graph for values. If you see %y = add i32 %x, 1, you know immediately which instruction defined %x — it’s the single definition, and the use-def link is right there.

Memory has no such property in IR. A load doesn’t carry a pointer to “the store that produced the value I’m reading”. The relevant store could be anywhere earlier in the function — possibly behind a phi, possibly behind a chain of calls, possibly aliasing through GEPs. Classically, finding that store requires walking backward through every intervening memory operation and asking alias analysis “does this one clobber me?”.

That backward walk is correct but expensive. On a function with thousands of memory ops and cycles, it easily becomes the bottleneck of any pass that queries it.

MemorySSA solves this by building an SSA-shaped representation over memory itself: every memory-touching instruction becomes a MemoryAccess node, merge points get MemoryPhi nodes, and a cached walker answers “what clobbers this?” queries in amortized constant time.

The three node kinds

Every memory-touching instruction in the function gets exactly one of these (MemorySSA.h:140-661):

There’s also a sentinel LiveOnEntryDef — a synthetic MemoryDef representing “all memory that existed when this function began”. Loads that don’t have any visible prior store in the function point to this.

Reading the header (MemorySSA.h:20-53), you get an annotated IR example:

define i32 @main() {
entry:
  ; 1 = MemoryDef(liveOnEntry)
  %call = call noalias i8* @_Znwm(i64 4)
  ; 2 = MemoryDef(1)
  store i32 5, i32* %0, align 4
  ; MemoryUse(2)
  %2 = load i32* %0, align 4
  ret i32 %2
}

Those ; 1 = MemoryDef(...) and ; MemoryUse(...) comments are what -passes=print<memoryssa> adds to the IR when it dumps the analysis. The number (1, 2) is the version, and the parenthesized operand is the access this one depends on. Reading the annotations tells you: “the load sees the store (version 2), which came after the allocation (version 1), which started from live-on-entry.”

How construction works

MemorySSA is built using exactly the same algorithm as regular SSA — dominance frontiers and renaming — just applied to memory instead of individual values. The entry point is buildMemorySSA (MemorySSA.cpp:1529-1598).

Phase 1: create the sentinel (MemorySSA.cpp:1537-1538):

LiveOnEntryDef.reset(new MemoryDef(StartingPoint.getContext(), nullptr,
                                   nullptr, &StartingPoint, NextID++));

LiveOnEntryDef is not attached to any instruction in the IR. It’s a virtual MemoryDef with version 0, representing memory visible at function entry (arguments, globals, allocations made in prior callers).

Phase 2: scan instructions (MemorySSA.cpp:1546-1567):

for (BasicBlock &B : Blocks) {
  for (Instruction &I : B) {
    MemoryUseOrDef *MUD = createNewAccess(&I, &BAA);
    if (!MUD) continue;
    Accesses->push_back(MUD);
    if (isa<MemoryDef>(MUD)) {
      DefiningBlocks.insert(&B);
    }
  }
}

Every instruction is asked: do you touch memory? If yes, a MemoryUse or MemoryDef is created for it. Blocks containing at least one MemoryDef are collected into DefiningBlocks — we need them in Phase 3.

Phase 3: place MemoryPhi nodes (MemorySSA.cpp:1515-1526). This uses the classical Cytron et al. SSA placement algorithm: compute the iterated dominance frontier of the set of defining blocks, and put a phi at each block in the IDF.

void MemorySSA::placePHINodes(const SmallPtrSetImpl<BasicBlock *> &DefiningBlocks) {
  ForwardIDFCalculator IDFs(*DT);
  IDFs.setDefiningBlocks(DefiningBlocks);
  SmallVector<BasicBlock *, 32> IDFBlocks;
  IDFs.calculate(IDFBlocks);

  for (auto &BB : IDFBlocks)
    createMemoryPhi(BB);
}

This is the “sparse” in “sparse SSA”: you don’t get a MemoryPhi at every merge point in the CFG, only at merge points where two distinct memory definitions could reach. If a block has three predecessors but they all pass through the same MemoryDef, no phi is needed.

Phase 4: rename (MemorySSA.cpp:1570-1598). The standard SSA renaming pass walks the dominator tree, maintaining a “current memory definition” for each path. At each MemoryUse, it sets the use’s defining access to the current def. At each MemoryDef, it updates the current def. MemoryPhis get their operands filled in from predecessors’ current defs.

By the end, every MemoryUse points at exactly one MemoryAccess (either a MemoryDef, a MemoryPhi, or LiveOnEntryDef). Every MemoryDef has one “defining access” pointer to the access it came after.

But “defining access” isn’t the clobber

Here’s the subtlety that took me longest to internalize. The defining access of a MemoryUse is not necessarily the thing that clobbers it — it’s just the most recent memory op along the def chain.

Consider:

; 1 = MemoryDef(liveOnEntry)
store i32 1, ptr %x
; 2 = MemoryDef(1)
store i32 2, ptr %y   ; aliasing with %x? Probably not.
; MemoryUse(2)
%v = load i32, ptr %x

After construction, the load’s defining access is 2 — the most recent MemoryDef in program order. But the store at 2 writes to %y, not %x. The true clobber of the load is 1. Finding that true clobber is the job of the walker.

The caching clobber walker

The walker is CachingWalker (MemorySSA.cpp:1022-1063). Its main entry point is getClobberingMemoryAccess:

MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA,
                                        BatchAAResults &BAA) override {
  unsigned UpwardWalkLimit = MaxCheckLimit;
  return getClobberingMemoryAccess(MA, BAA, UpwardWalkLimit);
}

Under the hood, it walks the defining-access chain upward, asking alias analysis at each MemoryDef “do you actually clobber my location?”. The first one that says yes is returned. Roughly (MemorySSA.cpp:557-596):

UpwardsWalkResult walkToPhiOrClobber(DefPath &Desc, ...) const {
  for (MemoryAccess *Current : def_chain(Desc.Last)) {
    if (auto *MD = dyn_cast<MemoryDef>(Current)) {
      if (!--*UpwardWalkLimit) return {Current, true};

      if (instructionClobbersQuery(MD, Desc.Loc, Query->Inst, *AA))
        return {MD, true};
    }
  }
  return {Desc.Last, false};
}

There’s a MaxCheckLimit (default 100) to prevent pathological walks. If the walker can’t find an answer in 100 steps, it returns conservatively.

When the walk hits a MemoryPhi, the algorithm has to recurse on each incoming value, find the clobber on each path, and combine the results. This is tryOptimizePhi (MemorySSA.cpp:771+). If every incoming path ends at the same clobber, the phi gets short-circuited — the walker returns that clobber directly, skipping the phi.

The caching part. The true clobber for a MemoryDef is stored in the def itself, in the “optimized access” slot (MemorySSA.h:392-395):

void setOptimized(MemoryAccess *MA) {
  setOperand(1, MA);
  OptimizedID = MA->getID();
}

Next time the walker is asked about the same def, it returns the cached result immediately. This is what makes the walker amortized O(1): each MemoryDef’s clobber is computed at most once, even if queried many times.

What MemorySSA printing looks like

The annotated format is exactly what you get from opt -passes='print<memoryssa>'. From llvm/test/Analysis/MemorySSA/print-walker.ll:

; 1 = MemoryDef(liveOnEntry)->liveOnEntry - clobbered by liveOnEntry
store i8 42, ptr %a1

; 2 = MemoryDef(1)->liveOnEntry - clobbered by liveOnEntry
store i8 42, ptr %a2

; MemoryUse(1) - clobbered by 1 = MemoryDef(liveOnEntry)->liveOnEntry
%l1 = load i8, ptr %a1

Reading those lines:

The “def chain” (MemoryUse → defining access) and the “clobber chain” (MemoryUse → walker result) are two different things, and the printer shows both.

A tiny worked example

From llvm/test/Analysis/MemorySSA/function-clobber.ll:

@g = external global i32
declare void @modifyG()

define i32 @foo(i1 %arg) {
; CHECK: MemoryUse(liveOnEntry)
; CHECK-NEXT: %1 = load i32
  %1 = load i32, ptr @g

; CHECK: 1 = MemoryDef(liveOnEntry)
; CHECK-NEXT: store i32 4
  store i32 4, ptr @g, align 4

; CHECK: 2 = MemoryDef(1)
; CHECK-NEXT: call void @modifyG()
  call void @modifyG()

; CHECK: MemoryUse(2)
; CHECK-NEXT: %2 = load i32
  %2 = load i32, ptr @g

  %3 = add i32 %2, %1
  ret i32 %3
}

Walking through:

  1. The first load has no prior def in the function, so its defining access is liveOnEntry. Its clobber is also liveOnEntry — nothing in this function could have written @g yet.
  2. The store is 1 = MemoryDef(liveOnEntry). It’s the first def.
  3. The call to @modifyG() might modify any global, including @g. It’s recorded as 2 = MemoryDef(1).
  4. The second load is MemoryUse(2). When asked “what clobbers this load?”, the walker starts at 2, checks “does @modifyG() possibly write to @g?”, alias analysis says yes, so 2 is returned.

For an optimization like GVN that wants to decide “is this second load the same value as the first one?”, the answer is: their clobbers differ (liveOnEntry vs 2), so no — and GVN correctly leaves both loads in place. Without MemorySSA, GVN would have had to walk every instruction between the two loads and alias-check each; with MemorySSA, it’s a single lookup.

Why the sparse design matters

Two property of MemorySSA are worth pausing on:

It’s sparse. A function with a million instructions, of which only a hundred touch memory, has a MemorySSA with about a hundred MemoryUse/Def nodes plus a handful of MemoryPhis. You pay for memory ops, not for the rest of the function.

It’s cached. Every clobber query gets memoized on the node it starts from. A pass that does a million queries on the same function pays maybe 10x the cost of the first query, not 1 million x.

Compare this to the classical alternative — “for each load, walk backward through every instruction, asking alias analysis at each step”. That’s O(N²) in the worst case and has no caching story. Passes like MemCpyOpt, DSE, LICM, and GVN in their modern forms all rely on MemorySSA to be queryable at this scale; it’s a foundational piece of middle-end infrastructure.

If you want to keep going on MemorySSA itself, the header comment in MemorySSA.h:9-83 is one of the better written in the tree — it has both the motivation and the algorithm. After that:

And if you want to see MemorySSA in use, my MemCpyOpt post has a handful of writtenBetween(MSSA, ...) calls whose semantics are now less mysterious. Every one of them is, under the hood, a clobber-walker query.