GVN and PRE: How LLVM Deletes Redundant Work || Packed Bits

If you write a + b twice in a function, you want the compiler to compute it once. That’s the promise of Global Value Numbering — LLVM’s GVN pass — and it’s surprisingly powerful once you start looking at what it can handle. Two add instructions with the same operands? Collapse them. A load after a store to the same pointer? Replace the load with the stored value. A computation that exists on one control-flow path but is “partially redundant” with another? Hoist it to where it’s needed and delete duplicates.

That last case — partial redundancy — is where GVN stops being simple hash-consing and starts doing Partial Redundancy Elimination (PRE). This post walks through both parts: the value-numbering core, the load-forwarding layer, and the PRE logic that inserts compensation code to turn partial redundancies into full redundancies.

All line numbers refer to llvm/lib/Transforms/Scalar/GVN.cpp. The header is llvm/include/llvm/Transforms/Scalar/GVN.h.

Part 1: value numbering

The foundational idea is simple. Every instruction in the function gets assigned a value number — an integer such that two instructions with the same value number are guaranteed to compute the same value. If you find two instructions with the same value number, and one dominates the other, you can delete the dominated one and replace its uses with the dominator.

The mechanism is a hash-consed expression table. GVN defines an Expression struct (GVN.cpp:140-171):

struct llvm::GVNPass::Expression {
  uint32_t Opcode;
  bool Commutative = false;
  Type *Ty = nullptr;
  SmallVector<uint32_t, 4> VarArgs;

  bool operator==(const Expression &Other) const { ... }
  friend hash_code hash_value(const Expression &Value) { ... }
};

Notice: the operands are stored as value numbers, not values. This is what makes the scheme transitive. If %x = add i32 %a, %b gets value number 5, and later %y = add i32 %a, %b appears, we look up both %a and %b’s value numbers, form the expression (Add, [VN(a), VN(b)]), hash it, find it’s already in the table with VN 5, and conclude that %y also has value number 5.

The lookup-or-add routine is lookupOrAdd (GVN.cpp:650+):

uint32_t GVNPass::ValueTable::lookupOrAdd(Value *V) {
  DenseMap<Value *, uint32_t>::iterator VI = ValueNumbering.find(V);
  if (VI != ValueNumbering.end())
    return VI->second;

  auto *I = dyn_cast<Instruction>(V);
  if (!I) {
    ValueNumbering[V] = NextValueNumber;
    return NextValueNumber++;
  }

  Expression Exp = createExpr(I);
  uint32_t E = assignExpNewValueNum(Exp).first;
  ValueNumbering[V] = E;
  return E;
}

That recursion through operands is where the power comes from. Suppose you have:

%t1 = add i32 %a, %b
%t2 = mul i32 %t1, 2
%t3 = add i32 %a, %b
%t4 = mul i32 %t3, 2   ; same as %t2!

%t3 hashes to the same expression as %t1, so both get VN 5. Then %t4’s expression uses VN 5 for its first operand, same as %t2. Both multiplies get the same VN. GVN deletes %t3 and %t4.

The leader table

Knowing two instructions share a value number isn’t enough — you also need to know which one dominates the other, because only the dominator can be used as a replacement. That’s what the LeaderTable (GVN.h:264-323) is for. It maps each VN to a list of values that have that VN, plus the basic block each was defined in:

class LeaderMap {
  DenseMap<uint32_t, LeaderListNode> NumToLeaders;

  void insert(uint32_t N, Value *V, const BasicBlock *BB) { ... }
  iterator_range<leader_iterator> getLeaders(uint32_t N) { ... }
};

When GVN wants to replace an instruction with an earlier-equivalent one, it queries:

Value *GVNPass::findLeader(const BasicBlock *BB, uint32_t Num) {
  auto Leaders = LeaderTable.getLeaders(Num);
  for (const auto &Entry : Leaders) {
    if (DT->dominates(Entry.BB, BB)) {
      return Entry.Val;
    }
  }
  return nullptr;
}

“Give me a value with value number Num whose defining block dominates BB.” If one exists, the instruction in BB can be replaced.

Part 2: redundant load elimination

Value numbering by itself handles arithmetic. Loads are harder because their value depends on memory state — two load instructions with identical pointer operands don’t necessarily return the same value if there’s a store between them.

GVN handles loads specially via processLoad (GVN.cpp:2161-2213):

bool GVNPass::processLoad(LoadInst *L) {
  if (!MD) return false;

  MemDepResult Dep = MD->getDependency(L);

  if (Dep.isNonLocal())
    return processNonLocalLoad(L);

  auto AV = AnalyzeLoadAvailability(L, Dep, L->getPointerOperand());
  if (!AV) return false;

  Value *AvailableValue = AV->MaterializeAdjustedValue(L, L);
  L->replaceAllUsesWith(AvailableValue);
  salvageAndRemoveInstruction(L);
  ++NumGVNLoad;
  return true;
}

The machinery here is MemoryDependenceAnalysis (or in newer codepaths, MemorySSA). It answers “what’s the closest memory operation that this load depends on?”. If the dependency is a store to the same address, the load can be forwarded directly:

; Before
store i32 42, ptr %p
%x = load i32, ptr %p

; After GVN
store i32 42, ptr %p
%x = ... (uses the value 42 directly)

AnalyzeLoadAvailability (GVN.cpp:1331-1471) handles the common cases:

Store dependency with matching type. Use the stored value.
Earlier load of the same location. Use that load’s result (and delete the newer load).
Clobbering store with partial overlap. If the store covers a superset of what the load reads, extract the relevant bytes via shift/mask and use those.

That third case is why the function is 100+ lines: it has to figure out offsets, handle endianness, and generate the right extraction code.

Part 3: Partial Redundancy Elimination

Here’s where it gets interesting. Consider:

entry:
  %a = ...
  %b = ...
  br i1 %cond, label %then, label %else

then:
  %x = add i32 %a, %b
  ...
  br label %merge

else:
  ; no add here
  br label %merge

merge:
  %y = add i32 %a, %b     ; partially redundant!

The add in merge is redundant along the then path (we already computed it), but not along the else path (nothing computed it there). So we can’t just delete it. But we can hoist the add into else, making it fully available in merge, and then eliminate the redundant add in merge using a phi.

That’s PRE. The output looks like:

entry:
  br i1 %cond, label %then, label %else

then:
  %x = add i32 %a, %b
  br label %merge

else:
  %x.pre = add i32 %a, %b  ; inserted
  br label %merge

merge:
  %y = phi i32 [ %x, %then ], [ %x.pre, %else ]

One computation per path instead of two on the then path. Net instruction count went down on the hot path, and the phi is free.

PRE for scalars

The scalar version is performScalarPRE (GVN.cpp:2942-3106). The key shape of the code:

bool GVNPass::performScalarPRE(Instruction *CurInst) {
  uint32_t ValNo = VN.lookup(CurInst);

  unsigned NumWith = 0;
  unsigned NumWithout = 0;
  BasicBlock *PREPred = nullptr;

  for (BasicBlock *P : predecessors(CurrentBlock)) {
    uint32_t TValNo = VN.phiTranslate(P, CurrentBlock, ValNo, *this);
    Value *PredV = findLeader(P, TValNo);

    if (!PredV) {
      PREPred = P;
      ++NumWithout;
    } else {
      ++NumWith;
    }
  }

  if (NumWithout > 1 || NumWith == 0)
    return false;

  if (NumWithout != 0) {
    PREInstr = CurInst->clone();
    performScalarPREInsertion(PREInstr, PREPred, CurrentBlock, ValNo);
  }

  PHINode *Phi = PHINode::Create(CurInst->getType(), PredMap.size(), ...);
  for (auto &[V, BB] : PredMap) {
    if (V) Phi->addIncoming(V, BB);
    else   Phi->addIncoming(PREInstr, PREPred);
  }

  CurInst->replaceAllUsesWith(Phi);
  removeInstruction(CurInst);
  return true;
}

Walk through: for each predecessor of the current block, check if the value we’re about to compute is already available there (via findLeader). If exactly one predecessor lacks it, that’s where we insert the compensation. If more than one predecessor lacks it, PRE bails — the heuristic is “don’t insert in more than one place”, because each insertion is only profitable if most paths were already computing it.

The phiTranslate call is subtle: when crossing from predecessor P into the current block, operands that are phi nodes need to be replaced with their P-incoming value. This is what lets PRE work across phi boundaries.

PRE for loads

Load PRE is more ambitious because the “missing” value is a load rather than an arithmetic op. Consider:

block1:
  br i1 %C, label %block2, label %block3

block2:
  br label %block4

block3:
  store i32 0, ptr %p
  br label %block4

block4:
  %PRE = load i32, ptr %p

Along the block3 path, we know what the load would return: 0 (the stored value). Along the block2 path, we don’t — there’s no store there. So we insert a load in block2 and then use a phi to merge.

After load PRE:

block2:
  %pre.pre = load i32, ptr %p   ; inserted
  br label %block4

block3:
  store i32 0, ptr %p
  br label %block4

block4:
  %PRE = phi i32 [ 0, %block3 ], [ %pre.pre, %block2 ]

The original load in block4 is gone. Both paths compute the value once at most, and the phi does the merge for free.

The implementation is PerformLoadPRE (GVN.cpp:1659-1907), which is substantially longer than scalar PRE because it has to:

Check every predecessor — is the load value available there (via a store, an earlier load, etc.)?
For unavailable predecessors, insert a compensation load.
Handle pointer translation through phis (if the load’s address depends on a phi, the address on each incoming path is different).
Build the merging phi node and wire it up.
Update MemorySSA if it’s live.

Critical edges

Sometimes a predecessor that “lacks the value” is on a critical edge — an edge from a block with multiple successors into a block with multiple predecessors. You can’t just insert code on the source block, because that source has other successors that don’t need the computation. And you can’t insert in the destination because that’s where we’re trying to delete from.

The fix is to split the critical edge: insert a new block between the two, and put the compensation code there. splitCriticalEdges (GVN.cpp:3137-3149) is a thin wrapper:

BasicBlock *GVNPass::splitCriticalEdges(BasicBlock *Pred, BasicBlock *Succ) {
  BasicBlock *BB = SplitCriticalEdge(
      Pred, Succ,
      CriticalEdgeSplittingOptions(DT, LI, MSSAU)
          .unsetPreserveLoopSimplify());

  if (BB) {
    if (MD)
      MD->invalidateCachedPredecessors();
    InvalidBlockRPONumbers = true;
  }
  return BB;
}

After splitting, the “predecessor” PRE was going to insert into is actually the new intermediate block, which by construction has exactly one successor — the destination. Clean insertion point.

Load PRE detects the need explicitly (GVN.cpp:1746-1774):

if (Pred->getTerminator()->getNumSuccessors() != 1) {
  // Critical edge: either find a load to hoist into this pred, or split.
  if (LoadInst *LI = findLoadToHoistIntoPred(Pred, LoadBB, Load))
    CriticalEdgePredAndLoad[Pred] = LI;
  else
    CriticalEdgePredSplit.push_back(Pred);
}

for (BasicBlock *OrigPred : CriticalEdgePredSplit) {
  BasicBlock *NewPred = splitCriticalEdges(OrigPred, LoadBB);
  PredLoads[NewPred] = nullptr;
}

The first attempt is to find a load elsewhere in the predecessor’s successors that can be hoisted (a sort of “sink the shared part” play). If that fails, fall back to critical edge splitting.

Putting it together

The GVN pass’s top-level loop is iterateOnFunction (GVN.cpp:3173). It walks each basic block and, for each instruction, tries:

Simple value numbering. Look it up in the leader table; if a dominating equivalent exists, replace and delete.
Load forwarding. If it’s a load, ask processLoad whether a store or earlier load provides the value.
PRE. If the instruction is a candidate for scalar PRE (a binary op with all-available operands) or load PRE, try to hoist compensation code.

The process iterates until no more changes happen — GVN, like many LLVM passes, runs to a fixed point.

A simple PRE trace

From llvm/test/Transforms/GVN/PRE/pre-basic-add.ll:

; Before
entry:
  %0 = load i32, @H
  br i1 %cond, label %bb, label %bb1

bb:
  %3 = add i32 %0, 42
  store %3, @G
  br %bb1

bb1:
  %4 = add i32 %0, 42    ; partial: done in bb, not in entry->bb1
  store %4, @H

GVN notices that %4 has the same value number as %3. When it walks predecessors of bb1:

Predecessor bb: leader with this VN exists — it’s %3. NumWith++.
Predecessor entry: no leader with this VN. NumWithout++, PREPred = entry.

NumWithout == 1, so PRE proceeds. It checks the edge entry → bb1: that’s a critical edge (entry has two successors, bb1 has two preds), so split it. Now insert the compensation in the split block, and build the merging phi:

; After PRE
entry:
  %0 = load i32, @H
  br i1 %cond, label %bb, label %entry.bb1_crit_edge

entry.bb1_crit_edge:
  %.pre = add i32 %0, 42     ; inserted here
  br %bb1

bb:
  %3 = add i32 %0, 42
  store %3, @G
  br %bb1

bb1:
  %.pre.phi = phi i32 [ %.pre, entry.bb1_crit_edge ], [ %3, bb ]
  store %.pre.phi, @H

One add on each path, a cheap phi at the merge. The original %4 is gone.

Why this matters

GVN is one of the heaviest lifters in the LLVM middle-end. Every -O2 compilation runs it, and the transformations it performs — especially load forwarding and PRE — interact with almost every other optimization:

Inlining exposes more opportunities for GVN to see across what used to be function boundaries.
SROA creates clean scalar allocas that GVN’s load forwarding can then completely eliminate.
LICM moves loop-invariant loads out of loops, after which GVN can often forward them.
InstCombine canonicalizes arithmetic, making value numbering match more aggressively.

Read the header at GVN.h, then iterateOnFunction and processLoad, and finally performScalarPRE. In about two hours of careful reading you’ll have the whole pass mapped out — and you’ll understand why a lot of hand-written optimizations in frontends are just “do some canonicalization and let GVN clean up”.

GVN and PRE: How LLVM Deletes Redundant Work