Where LLVM's readonly, nocapture, and nonnull Come From || Packed Bits

Look at optimized LLVM IR and you’ll see attributes sprinkled everywhere:

declare void @memcpy(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64) #0

Some of these attributes come from Clang, some from function signatures in headers, some from __attribute__ annotations in source code. But a surprising number of them are inferred by LLVM itself. The compiler looks at a function body, proves some property about it, and tags the function with an attribute that downstream passes can rely on.

This post is about how that inference works. LLVM has two separate infrastructures for it: the classical FunctionAttrs pass and the newer, more powerful Attributor framework. They overlap in what they can prove, but the way they go about it is quite different.

All line numbers refer to files under llvm/lib/Transforms/IPO/: FunctionAttrs.cpp, Attributor.cpp, AttributorAttributes.cpp.

Why these attributes matter

Before we dig into how they’re inferred, let’s look at why anyone bothers. A handful of attributes and what they unlock:

readonly — function doesn’t write to memory. Enables Loop-Invariant Code Motion: a loop body that calls a readonly function can hoist the call out of the loop if its arguments don’t change.
readnone — function doesn’t read or write to memory. Stronger than readonly. Allows full CSE.
nocapture (on a pointer argument) — the callee doesn’t store the pointer anywhere or let it escape. This is what makes stack-move and call-slot optimizations (I wrote about them in the MemCpyOpt post) legal.
noalias (on a return value) — the returned pointer aliases nothing else in the program. This is the attribute on malloc, calloc, and similar allocators. It enables very aggressive alias analysis.
willreturn — function always returns. Means the call can’t infinite-loop, so compilers can move it out of loops without worrying about changing termination behavior.
nonnull (on a return value or argument) — the pointer is never null. Eliminates null checks.
nofree — function doesn’t free any memory. Enables reasoning about pointer lifetimes across calls.
norecurse — function never calls itself (directly or transitively). Simplifies stack usage analysis.

Most of these propagate: if f calls g and g is readonly, then f might be inferable as readonly too. The inference passes do this propagation automatically.

FunctionAttrs: SCC-based postorder

FunctionAttrs.cpp is the older, simpler pass. It’s structured around the call graph: specifically, around its strongly-connected components (SCCs). The pass visits SCCs in postorder — callees before callers — so by the time we’re looking at a function, every function it calls has already been analyzed.

The top-level entry point is PostOrderFunctionAttrsPass::run (FunctionAttrs.cpp:2315). It delegates to deriveAttrsInPostOrder (FunctionAttrs.cpp:2273):

SmallPtrSet<Function *, 8> deriveAttrsInPostOrder(
    ArrayRef<Function *> Functions, ...) {
  SCCNodesResult Nodes = createSCCNodeSet(Functions);

  addArgumentReturnedAttrs(Nodes.SCCNodes, Changed);
  addMemoryAttrs(Nodes.SCCNodes, AARGetter, Changed);
  addArgumentAttrs(Nodes.SCCNodes, Changed, ...);
  addWillReturn(Nodes.SCCNodes, Changed);
  addNoAliasAttrs(Nodes.SCCNodes, Changed);
  addNonNullAttrs(Nodes.SCCNodes, Changed);
  addNoRecurseAttrs(Nodes.SCCNodes, Changed);
  // ...
  return Changed;
}

One function per attribute, each operating on a whole SCC at once. That last part is the subtle design decision.

Why SCC-at-a-time? Mutually recursive functions form an SCC. If you try to infer readonly for f without knowing whether g (which it calls) is readonly, and g’s analysis needs to know about f, you’re stuck. FunctionAttrs cuts the knot by treating the whole SCC as a unit: assume optimistically that all members of the SCC have the property, analyze each one, and if any analysis fails, retract the property from the whole SCC.

Inferring memory attributes

addMemoryAttrs (FunctionAttrs.cpp:276-313) is a good representative:

template <typename AARGetterT>
static void addMemoryAttrs(const SCCNodeSet &SCCNodes, ...) {
  MemoryEffects ME = MemoryEffects::none();

  for (Function *F : SCCNodes) {
    AAResults &AAR = AARGetter(*F);
    auto [FnME, FnRecursiveArgME] =
        checkFunctionMemoryAccess(*F, F->hasExactDefinition(), AAR, SCCNodes);
    ME |= FnME;
  }

  for (Function *F : SCCNodes) {
    MemoryEffects OldME = F->getMemoryEffects();
    MemoryEffects NewME = ME & OldME;
    if (NewME != OldME) {
      ++NumMemoryAttr;
      F->setMemoryEffects(NewME);
      Changed.insert(F);
    }
  }
}

Two passes through the SCC:

Collect. For each function, scan every instruction and collect what kinds of memory it accesses. Loads contribute Ref (reads), stores contribute Mod (writes), calls contribute whatever they claim via their own attributes. Calls within the SCC are ignored on this first pass — we’ll handle recursion conservatively.
Apply. Take the union of memory effects across the SCC. Every function in the SCC then gets attributed with that union. If the union is “reads only”, every function in the SCC is readonly. If “nothing”, they’re all readnone. Anything stronger than what each function already had is set.

This is where the “SCC as unit” strategy pays off. A cycle of mutually recursive readonly functions will have all of them analyzed as readonly simultaneously.

Inferring nonnull

addNonNullAttrs (FunctionAttrs.cpp:1639-1693) has a similar shape but with a speculate-then-verify wrinkle:

static void addNonNullAttrs(const SCCNodeSet &SCCNodes, ...) {
  bool SCCReturnsNonNull = true;

  for (Function *F : SCCNodes) {
    if (!isReturnNonNull(F, SCCNodes, Speculative))
      SCCReturnsNonNull = false;
  }

  if (SCCReturnsNonNull) {
    for (Function *F : SCCNodes) {
      F->addRetAttr(Attribute::NonNull);
      Changed.insert(F);
    }
  }
}

isReturnNonNull walks every return statement, tracing the returned value back through phis, selects, and calls. If all return values can be proven non-null (where a recursive call within the SCC is speculatively assumed to return non-null), the function’s return is nonnull. If any single function in the SCC fails the check, the whole SCC is disqualified.

The speculation is what makes recursive cases work. Otherwise a function like:

int *recursive(int n) {
  if (n == 0) return &global_int;
  return recursive(n - 1);
}

would be stuck: proving the return is nonnull requires knowing that the recursive call returns nonnull, which requires already knowing that the return is nonnull. FunctionAttrs handles this by assuming optimistically and then verifying. If the assumption would have been wrong, the “any function failed” check catches it.

The limit of FunctionAttrs

FunctionAttrs works well for attributes that can be computed by “look at each function once in SCC order”. But some attributes depend on each other in more complex ways:

Knowing an argument is nocapture might let you conclude the function is readonly.
Knowing the function is readonly might let you conclude an argument is nocapture (because pointer stores would be writes).

FunctionAttrs does some of this within a pass, but propagating across multiple attributes and multiple functions in arbitrary dependency patterns is where it runs out of steam. That’s where Attributor comes in.

Attributor: abstract interpretation with a fixpoint

The Attributor framework (Attributor.cpp, AttributorAttributes.cpp) is structured differently. Instead of one function per attribute, each attribute is an abstract attribute (AA) — a C++ object that represents the compiler’s current best knowledge about one property of one piece of IR.

The main classes are in Attributor.h:

struct AbstractAttribute {
  virtual ChangeStatus updateImpl(Attributor &A) = 0;

  virtual ChangeStatus manifest(Attributor &A) {
    return ChangeStatus::UNCHANGED;
  }

  virtual void trackStatistics() const = 0;
};

An AA has:

A state that starts pessimistic (everything might be bad) and refines toward optimistic (provably good). For example, AANoCapture tracks whether the value is captured in memory, in an integer representation, or through the return.
An updateImpl method that re-runs the analysis. It can query other AAs, read the IR, call alias analysis, whatever it needs. Returns CHANGED if the state got more precise.
A manifest method that writes the inferred attribute into the IR at the end.

The fixpoint loop

The driver is Attributor::runTillFixpoint (Attributor.cpp:2119-2258):

void Attributor::runTillFixpoint() {
  SetVector<AbstractAttribute *> Worklist;
  Worklist.insert_range(DG.SyntheticRoot);

  unsigned IterationCounter = 1;
  do {
    LLVM_DEBUG(dbgs() << "\n[Attributor] #Iteration: " << IterationCounter << "\n");

    for (AbstractAttribute *AA : Worklist) {
      if (!AA->getState().isAtFixpoint())
        if (updateAA(*AA) == ChangeStatus::CHANGED)
          ChangedAAs.push_back(AA);
    }

    Worklist.clear();
    Worklist.insert_range(ChangedAAs);

  } while (!Worklist.empty() && (IterationCounter++ < MaxIterations));
}

Classic worklist fixed-point iteration:

Start with all AAs on the worklist.
Update each. Anything whose state changed goes to ChangedAAs.
Next iteration’s worklist is the AAs that depended on any of the changed ones (tracked elsewhere).
Repeat until nothing changes or we hit a hard iteration limit.

The power of this design is that when AA X changes, all AAs that queried X get re-added automatically. You can have chains like:

AAMemoryLocation proves function F is readonly.
Next iteration, AANoCapture for F’s argument notices F is readonly, so stores of the pointer are impossible, so the argument is nocapture.
Iteration after that, AANonNull notices the argument is nocapture and can further refine its null analysis.

Each of these happens automatically via the dependency tracking; you never explicitly coordinate the order.

An example AA: AANoCapture

From AttributorAttributes.cpp:5923-6128:

struct AANoCaptureImpl : public AANoCapture {
  ChangeStatus updateImpl(Attributor &A) override {
    if (AA::isAssumedReadOnly(A, FnPos, *this, IsKnown)) {
      T.addKnownBits(NOT_CAPTURED_IN_MEM);
    }

    auto UseCheck = [&](const Use &U, bool &Follow) -> bool {
      return checkUse(A, T, U, Follow);
    };

    if (!A.checkForAllUses(UseCheck, *this, *V))
      return indicatePessimisticFixpoint();

    auto Assumed = S.getAssumed();
    S.intersectAssumedBits(T.getAssumed());
    return (Assumed == S.getAssumed()) ? ChangeStatus::UNCHANGED
                                       : ChangeStatus::CHANGED;
  }
};

Reading top to bottom:

Ask the Attributor: is the containing function known readonly? (This is a query to another AA, AAMemoryLocation.) If yes, stores of our pointer are impossible, so we know it can’t be captured via memory.
Walk every use of the pointer. checkForAllUses calls checkUse for each, which decides whether this particular use preserves the no-capture property (e.g., using the pointer as a load address is fine; storing it to a global captures it).
If any use can’t be shown safe, give up and take a pessimistic fixpoint.
Otherwise, intersect our running state S with the new information T. If it changed, return CHANGED.

The query in step 1 is what creates the dependency. If AAMemoryLocation for our function later refines (e.g., decides the function is also not writing memory through arguments), our updateImpl will be re-invoked.

Manifest phase

After the fixpoint loop settles, the Attributor enters its manifest phase (Attributor.cpp:2647-2677):

ChangeStatus Attributor::run() {
  Phase = AttributorPhase::UPDATE;
  runTillFixpoint();

  Phase = AttributorPhase::MANIFEST;
  ChangeStatus ManifestChange = manifestAttributes();

  Phase = AttributorPhase::CLEANUP;
  ChangeStatus CleanupChange = cleanupIR();

  return ManifestChange | CleanupChange;
}

manifestAttributes walks all AAs and, for each one at a “good” fixpoint, writes the corresponding attribute into the IR. AANoCapture calls F->addParamAttr(ArgNo, Attribute::NoCapture), AANonNull calls F->addRetAttr(Attribute::NonNull), and so on.

The separation between update and manifest is important: during update, the IR hasn’t been modified, so AAs can query the “raw” code and reach their conclusions based on a stable state. Only when everything has converged do we actually emit the attributes.

A concrete test case

From llvm/test/Transforms/FunctionAttrs/nocapture.ll:

; Before attribute inference
define void @c2(ptr %q) {
  store ptr %q, ptr @g
  ret void
}

define void @c3(ptr %q) {
  call void @c2(ptr %q)
  ret void
}

What we want to conclude:

@c2 stores its argument to a global but doesn’t read from it. So it’s writeonly with respect to @g, and its argument %q is captured (it escaped to @g). But @c2 itself doesn’t read from the pointer, so arg attribute is writeonly.
@c3 just forwards to @c2. If @c2’s argument is writeonly, and nothing else touches %q in @c3, then @c3’s argument is also writeonly.

After running the FunctionAttrs pass (and the Attributor on top):

define void @c2(ptr nofree writeonly %q) {
  store ptr %q, ptr @g
  ret void
}

define void @c3(ptr %q) memory(write) {
  call void @c2(ptr nofree writeonly %q)
  ret void
}

Several inferences made:

@c2’s %q got writeonly and nofree.
@c2 overall got no memory attribute beyond what its body implies (it writes to @g).
@c3 got memory(write) — the overall function only writes, consistent with the single call’s effects.
The call site call void @c2(...) also got writeonly and nofree attributes, informing the caller’s own analyses.

FunctionAttrs vs Attributor in practice

Both passes run in the standard LLVM pipeline. FunctionAttrs is faster and handles the common cases. Attributor is slower but more precise, especially for attributes with cyclic dependencies. You usually see FunctionAttrs run first, then Attributor tightens things up.

The practical upshot: when you look at optimized IR and see nocapture on an argument, odds are FunctionAttrs put it there if the analysis was simple, or Attributor if it required iterative refinement. For most C/C++ code, both conclude the same thing and FunctionAttrs is sufficient. For heavily templated C++ (lots of small functions calling each other), Attributor’s iterative approach wins out.

The bigger picture

Function attribute inference is what makes interprocedural optimization work. Without it, most downstream passes would have to assume the worst about called functions — that they might read anything, write anything, free anything, capture any pointer passed in. With inferred attributes, passes can make targeted assumptions and run aggressively.

The dependency chain often looks like this:

A simple arithmetic function gets inferred readnone.
That enables a caller to be inferred readonly.
That enables LICM to hoist calls out of loops.
That changes the shape of code enough that another function qualifies for inlining.
After inlining, SROA sees clean allocas and eliminates more memory.
GVN cleans up the resulting code.

All of this from one readnone attribute that FunctionAttrs inferred in less than a millisecond. Interprocedural attributes are force multipliers.

Where LLVM's readonly, nocapture, and nonnull Come From