SelectionDAG Legalization: Turning IR Into Something the Target Can Actually Run || Packed Bits

Most of my blog posts so far have been about LLVM’s middle-end — analyses and optimizations that work on the IR. This one crosses into the backend. After the optimizer is done, the compiler hands the IR to a target-specific code generator, and one of the first things the backend does is build a SelectionDAG and start legalizing it.

“Legalizing” means: the DAG initially contains types and operations that the optimizer produced freely, without caring whether the target supports them. A 32-bit ARM can’t natively multiply two i64s. An x86 without AVX can’t do a v3i32 add. Those nodes have to be rewritten into a sequence the target’s instruction set can actually express. That’s what legalization does, and it does it in two sweeping phases: type legalization first, then operation legalization.

This post is a guided tour. The relevant files are all under llvm/lib/CodeGen/SelectionDAG/: LegalizeTypes.cpp, LegalizeIntegerTypes.cpp, LegalizeVectorTypes.cpp, and LegalizeDAG.cpp. Target-side setup lives in llvm/include/llvm/CodeGen/TargetLowering.h and each target’s *ISelLowering.cpp.

Background: the SelectionDAG

Before legalization, instruction selection’s first step has already built a DAG where each node corresponds to an operation (ISD::ADD, ISD::LOAD, ISD::CALL, …) and each value has a Machine Value Type (MVT): i32, i64, v4i32, etc. The DAG is logically what the source code wanted to compute, in a form closer to the target but still target-independent.

But the target doesn’t necessarily support every ISD::* opcode with every MVT combination. ISD::MUL on v4i32 is supported on x86 with SSE4.1 but not without. ISD::CTPOP on i32 is one instruction on some targets and a bit-manipulation sequence on others. ISD::FADD on f16 might need to be expanded into f32 operations on hardware without native half-precision.

Legalization’s job is to transform the DAG so every remaining operation is something the target can select.

The four actions

Every (opcode, type) pair has a legalization action from the target’s perspective. The enum is in TargetLowering.h:203-208:

enum LegalizeAction : uint8_t {
  Legal,      // The target natively supports this operation.
  Promote,    // This operation should be executed in a larger type.
  Expand,     // Try to expand this to other ops, otherwise use a libcall.
  LibCall,    // Don't try to expand this to other ops, always use a libcall.
  Custom      // Use the LowerOperation hook to implement custom lowering.
};

Targets declare these actions at startup using setOperationAction() in their TargetLowering constructor. For example, from X86ISelLowering.cpp:

X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
                                     const X86Subtarget &STI)
    : TargetLowering(TM, STI), Subtarget(STI) {
  // X86 doesn't support truncating stores from i64
  setTruncStoreAction(MVT::i64, MVT::i32, Expand);
  setTruncStoreAction(MVT::i64, MVT::i16, Expand);
  setTruncStoreAction(MVT::i64, MVT::i8,  Expand);

  setOperationAction(ISD::CLEAR_CACHE, MVT::Other, Expand);
  // ... hundreds more ...
}

This is, effectively, the target’s hardware manual translated into C++. The legalizer reads it to decide what to do with each node.

Phase 1: type legalization

The first phase makes sure every value in the DAG has a legal type — one the target can hold in a register (or a known combination of registers). The entry point is DAGTypeLegalizer::run() (LegalizeTypes.cpp:200-294):

for (unsigned i = 0, NumResults = N->getNumValues(); i < NumResults; ++i) {
  EVT ResultVT = N->getValueType(i);
  switch (getTypeAction(ResultVT)) {
  case TargetLowering::TypeLegal:
    break;
  case TargetLowering::TypePromoteInteger:
    PromoteIntegerResult(N, i);
    Changed = true;
    goto NodeDone;
  case TargetLowering::TypeExpandInteger:
    ExpandIntegerResult(N, i);
    Changed = true;
    goto NodeDone;
  case TargetLowering::TypeSplitVector:
    SplitVectorResult(N, i);
    Changed = true;
    goto NodeDone;
  case TargetLowering::TypeWidenVector:
    WidenVectorResult(N, i);
    Changed = true;
    goto NodeDone;
  // ... soften float, expand float, scalarize vector ...
  }
}

The type actions fall into a few families:

Promote — a small type becomes a larger legal type. Example: i1 → i8, or i8 → i32 on targets that only have word-sized registers. The operation is performed at the larger width; the result is truncated (implicitly or explicitly) when needed.

Expand — a large integer is split into pieces. Example: i64 → two i32s on a 32-bit target. The DAG gets restructured so every i64 value is represented as a pair (Lo, Hi) of i32 values.

Split vector — a wide vector becomes two narrower vectors. Example: v8i32 → two v4i32s.

Widen vector — an undersized vector gets padded to the next legal size. Example: v3i32 → v4i32, with the extra lane carrying undef.

Soften float — floats that the target can’t handle directly get represented as their integer bit patterns (e.g., f16 → i16). Operations then become calls to compiler-rt functions (__mulsf3, etc.).

Each action has a dispatcher that pattern-matches on opcode. PromoteIntegerResult is the model (LegalizeIntegerTypes.cpp:42):

void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
  LLVM_DEBUG(dbgs() << "Promote integer result: "; N->dump(&DAG));
  SDValue Res = SDValue();

  if (CustomLowerNode(N, N->getValueType(ResNo), true)) {
    LLVM_DEBUG(dbgs() << "Node has been custom expanded, done\n");
    return;
  }

  switch (N->getOpcode()) {
  case ISD::BITCAST:     Res = PromoteIntRes_BITCAST(N); break;
  case ISD::BSWAP:       Res = PromoteIntRes_BSWAP(N); break;
  case ISD::CTPOP:       Res = PromoteIntRes_CTPOP_PARITY(N); break;
  case ISD::LOAD:        Res = PromoteIntRes_LOAD(cast<LoadSDNode>(N)); break;
  // ... 80+ opcodes ...
  }

  ReplaceNodeWith(N, Res);
}

One helper per opcode, each one doing the specific rewrite for that operation at a promoted type. Promoting an add means zero- or sign-extending both operands, doing the add at the larger width, and remembering the result is “really” narrower.

Example: i64 multiply on a 32-bit target

ExpandIntegerResult (LegalizeIntegerTypes.cpp:3048) handles the expand case by returning two SDValues, the low and high halves of the original value:

void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
  SDValue Lo, Hi;
  Lo = Hi = SDValue();

  switch (N->getOpcode()) {
  case ISD::ADD:
  case ISD::SUB: ExpandIntRes_ADDSUB(N, Lo, Hi); break;
  case ISD::MUL:         ExpandIntRes_MUL(N, Lo, Hi); break;
  // ...
  }
}

For multiply specifically (LegalizeIntegerTypes.cpp:4528):

void DAGTypeLegalizer::ExpandIntRes_MUL(SDNode *N,
                                        SDValue &Lo, SDValue &Hi) {
  EVT VT = N->getValueType(0);
  EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);

  SDValue LL, LH, RL, RH;
  GetExpandedInteger(N->getOperand(0), LL, LH);
  GetExpandedInteger(N->getOperand(1), RL, RH);

  if (TLI.expandMUL(N, Lo, Hi, NVT, DAG, ..., LL, LH, RL, RH))
    return;

  RTLIB::Libcall LC = RTLIB::getMUL(VT);
  // falls through to a libcall to __muldi3
}

Conceptually, an i64 × i64 → i64 multiply, when decomposed into two i32 halves L = (LL, LH) and R = (RL, RH), becomes:

result = LL*RL + (LL*RH + LH*RL) << 32
       + (LH*RH) << 64     // top 64 bits, discarded for a 64-bit result

Each partial product is an i32×i32 operation (legal on the target). Three 32-bit multiplies and some additions replace a single i64 multiply. The legalizer generates the DAG for that arithmetic and the result is a pair of i32 SDValues. When the calling context later reads “the i64 result of this multiply”, it transparently reads both halves.

If the target doesn’t even have the right i32 operations — for very small chips — expandMUL returns false and we fall through to a libcall.

Vector operations

Vector legalization is similar in spirit but fancier. SplitVectorResult (LegalizeVectorTypes.cpp:1248) splits a too-wide vector into halves:

void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
  switch (N->getOpcode()) {
  case ISD::ADD:
  case ISD::SUB:
    // Recursively split operands, perform op on each half
    break;
  case ISD::EXTRACT_VECTOR_ELT:
    R = SplitVecRes_EXTRACT_VECTOR_ELT(N);
    break;
  // ... 60+ opcodes
  }
}

WidenVectorResult (LegalizeVectorTypes.cpp:4955) pads an undersized vector up to the next legal size, usually with poison in the unused lanes. For most arithmetic, the extra lanes don’t affect correctness because you’re only going to use the first N lanes anyway.

Phase 2: operation legalization

Once every value has a legal type, Phase 2 ensures every operation can actually be executed. That’s SelectionDAGLegalize::Legalize() (LegalizeDAG.cpp:985-1401):

void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
  // ... determine Action for this node ...

  switch (Action) {
  case TargetLowering::Legal:
    LLVM_DEBUG(dbgs() << "Legal node: nothing to do\n");
    return;

  case TargetLowering::Custom:
    LLVM_DEBUG(dbgs() << "Trying custom legalization\n");
    if (SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG)) {
      ReplaceNode(SDValue(Node, 0), Res);
      return;
    }
    [[fallthrough]];

  case TargetLowering::Expand:
    if (ExpandNode(Node))
      return;
    [[fallthrough]];

  case TargetLowering::Promote:
    PromoteNode(Node);
    return;

  case TargetLowering::LibCall:
    ConvertNodeToLibcall(Node);
    return;
  }
}

The switch’s fall-throughs are important: Custom falls through to Expand if the custom handler declined. Expand falls through to Promote if the target-independent expansion wasn’t available. LibCall generates a runtime call. It’s a waterfall from most-target-specific to most-target-generic.

The `ExpandNode` generic expansion

ExpandNode (LegalizeDAG.cpp:3236-3400+) has target-independent implementations for many opcodes. For example, ISD::CTPOP (population count, i.e., “how many bits are set”):

case ISD::CTPOP:
  if ((Tmp1 = TLI.expandCTPOP(Node, DAG)))
    Results.push_back(Tmp1);
  break;

expandCTPOP in TargetLowering.cpp implements the classic bit-manipulation sequence:

x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x * 0x01010101 >> 24;   // sum all bytes into top byte

So on a target without a popcnt instruction, a single ISD::CTPOP node becomes a chain of shifts, masks, and adds. All legal, all supported on any target that has basic integer arithmetic.

Custom lowering

When setOperationAction(..., ..., Custom) is declared, the legalizer calls TargetLowering::LowerOperation(). Each target implements this as a big switch:

SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
  switch (Op.getOpcode()) {
  // ... 200+ cases ...
  case ISD::MUL:
    return LowerMUL(Op, Subtarget, DAG);
  // ...
  }
}

LowerMUL in x86 is over 100 lines of logic that picks between IMUL, LEA-based multiply-by-constant patterns, SSE/AVX sequences, or libcalls, depending on operand types, constants, and subtarget features. This is where targets encode their hand-tuned lowering rules.

Libcalls

Some things just can’t be expanded — you need to call a library function. The canonical example is floating-point operations on chips without an FPU. ConvertNodeToLibcall (LegalizeDAG.cpp:4954) generates the runtime call:

// For unsupported FP: call __addsf3 (from compiler-rt) with both operands,
// get back the result as an integer of the same width, treat that as f32.

Every target configures which runtime library it expects; compiler-rt, libgcc, and MSVCRT provide the usual functions.

A worked example: v3i32 add on x86

Let’s trace a tiny one. LLVM IR:

define <3 x i32> @add3(<3 x i32> %a, <3 x i32> %b) {
  %r = add <3 x i32> %a, %b
  ret <3 x i32> %r
}

After instruction selection’s initial DAG building, the operation is ISD::ADD on type v3i32. x86 doesn’t have a native v3i32 — its SSE vector types come in widths of 128 bits (v4i32), 256 (v8i32), etc.

Phase 1 (types). x86’s TargetLowering declares v3i32 as TypeWidenVector. The legalizer calls WidenVectorResult, which:

Pads the v3i32 operands to v4i32 (extra lane = poison).
Rebuilds the ISD::ADD as an operation on v4i32.
Arranges for the extra lane of the result to be discarded when used.

Phase 2 (operations). Now the node is ISD::ADD on v4i32. x86 with SSE2 has a native paddd instruction for this, so the action is Legal. Nothing to do.

Result DAG: a single legal v4i32 add, with pad/trim operations around it. Instruction selection then matches it to PADDD xmm, xmm.

The “before” and “after” for this function at the DAG level would look roughly like:

Before legalization:
  ISD::ADD (v3i32, v3i32) -> v3i32

After type legalization:
  ISD::CONCAT_VECTORS pad %a to v4i32
  ISD::CONCAT_VECTORS pad %b to v4i32
  ISD::ADD (v4i32, v4i32) -> v4i32
  ISD::EXTRACT_SUBVECTOR back to v3i32 (or just use first 3 lanes)

After operation legalization:
  (no further change — v4i32 ADD is legal)

For more complex cases — imagine v3i64 on a 32-bit target — you’d get both phases doing substantial work: type legalization splits each lane into (Lo, Hi), then maybe widens the vector, and operation legalization might expand the multiply into libcalls. The pipeline is iterative enough to handle arbitrary chains of illegality.

Why this architecture works

The two-phase design — types first, operations second — is the right factoring because:

Types are more “viral” than operations. Changing a value’s type affects every user of that value. Doing this first keeps operation legalization simpler: it can assume types are already legal.
Operations are per-node. Once types are settled, deciding how to handle one ADD doesn’t interact with deciding how to handle the next ADD.
The action enum (Legal/Promote/Expand/Custom/LibCall) is uniform. Each opcode, each type, a single knob per target. Targets declare their rules in one place and the framework handles the dispatch.

The cost of this architecture is that the SelectionDAG backend is big. LegalizeDAG.cpp and friends together are tens of thousands of lines. But it scales: adding a new target is mostly a matter of implementing TargetLowering’s hooks and writing a bunch of setOperationAction calls, and you get all the generic legalization machinery for free.

What comes after

After legalization, the DAG has only legal types and legal operations. The next phase is instruction selection proper: pattern-match the DAG against the target’s instruction definitions (written in TableGen) and emit actual machine instructions. Then scheduling, register allocation, and emission.

But legalization is what bridges the optimizer’s abstract “here’s what the program computes” view and the scheduler’s concrete “here’s what this chip can execute” view. Understanding it is the key that unlocks reading LLVM’s backends.

SelectionDAG Legalization: Turning IR Into Something the Target Can Actually Run