ADR 0018: Document Tree Code Generation (Wadler-Lindig Pretty Printer)

Status

Implemented (2026-02-15)

Context

Beamtalk generates Core Erlang text (ADR 0003) via direct string emission — 1,100+ write!/writeln! macro calls across 28 files in crates/beamtalk-core/src/codegen/core_erlang/. The current architecture uses a single String output buffer with manual indentation tracking:

// Current approach: imperative string building
pub(super) struct CoreErlangGenerator {
    output: String,       // Direct string buffer
    indent: usize,        // Manual indentation counter
    // ...
}

// Example: generating a method table
writeln!(self.output, "'method_table'/0 = fun () ->")?;
self.indent += 1;
self.write_indent()?;
write!(self.output, "~{{")?;
for (i, (name, arity)) in methods.iter().enumerate() {
    if i > 0 { write!(self.output, ", ")?; }
    write!(self.output, "'{name}' => {arity}")?;
}
writeln!(self.output, "}}~")?;
self.indent -= 1;

Problems with Direct String Emission

1. Indentation is fragile. The self.indent counter must be manually incremented/decremented in matched pairs. Forgetting a decrement or nesting incorrectly produces malformed Core Erlang that erlc rejects — but the Rust code compiles fine. This class of bug is invisible at compile time.

2. No composability. Code fragments can't be built independently and combined. Every write! call mutates a shared String buffer, so you can't build a function body, inspect it, test it in isolation, or compose it with other fragments. Adding new codegen features requires understanding the mutation flow across multiple files. In practice, AI agents working on codegen issues struggle with the dense write!/writeln! + manual indent patterns — the code is hard to read and reason through, increasing the risk of subtle indentation errors that compile as valid Rust but produce invalid Core Erlang.

3. Testing requires full string comparison. The codegen subsystem has 196 snapshot tests and 170 unit tests. There's no way to unit-test individual code generation fragments (e.g., "does this method table generate correctly?") without running the full pipeline.

4. The codebase is large and growing. 1,100+ write!/writeln! calls across 28 files, with the heaviest files being:

File	`write!` calls	Purpose
`primitive_implementations.rs`	136	Intrinsic method bodies
`counted_loops.rs`	126	Loop codegen
`gen_server/methods.rs`	98	Method dispatch
`value_type_codegen.rs`	204	Value type modules
`intrinsics.rs`	80	Intrinsic dispatch

As the language grows (pattern matching, exception handling, type annotations), this approach will become increasingly difficult to maintain.

Note: Some specific refactoring tasks (e.g., module renaming) can be addressed independently by extending existing abstractions like ModuleName in erlang_types.rs. The document tree is a long-term architectural improvement for the codegen subsystem as a whole.

What Other Rust-Based Compilers Do

Gleam (Rust → Erlang/JS) uses a Document algebraic data type based on Wadler-Lindig's "Strictly Pretty" algorithm. Code generation builds a tree of document nodes, then renders once:

// Gleam's approach — declarative document composition
let module = docvec![
    header,
    "-compile([no_auto_import, nowarn_unused_vars]).",
    line(),
    exports,
    join(statements, lines(2)),
];
module.to_pretty_string(80)

Gleam's Document enum has variants: Str, Line, Nest, Group, Vec, Break. The docvec! macro provides ergonomic composition. Gleam uses this for both Erlang and JavaScript backends — same Document type, different rendering. Gleam's implementation is ~875 lines including tests.

Other compilers:

rustc uses rustc_ast_pretty (Wadler-style document trees) for AST pretty-printing
prettyplease (Rust ecosystem) uses Wadler-style documents for syn AST formatting
SWC (Rust → JS) uses structured AST → document rendering
The pretty crate on crates.io provides a ready-made Wadler-Lindig implementation

Pattern: Modern Rust compilers overwhelmingly use document trees for text code generation, not direct string emission.

Decision

Replace direct write!/writeln! string emission with a Wadler-Lindig document tree for Core Erlang code generation.

Introduce a Document enum (or use the pretty crate) and a docvec! macro for composing Core Erlang output. The code generator builds a tree of Document nodes, which is rendered to a string in a final pass.

What This Looks Like

Before (current):

fn generate_method_table(&mut self, methods: &[(String, usize)]) -> Result<()> {
    writeln!(self.output, "'method_table'/0 = fun () ->")?;
    self.indent += 1;
    self.write_indent()?;
    write!(self.output, "~{{")?;
    for (i, (name, arity)) in methods.iter().enumerate() {
        if i > 0 { write!(self.output, ", ")?; }
        write!(self.output, "'{name}' => {arity}")?;
    }
    writeln!(self.output, "}}~")?;
    self.indent -= 1;
    Ok(())
}

After (document tree):

fn generate_method_table(&self, methods: &[(String, usize)]) -> Document {
    let entries = join(
        methods.iter().map(|(name, arity)| docvec!["'", name, "' => ", arity]),
        ", "
    );
    docvec![
        "'method_table'/0 = fun () ->",
        nest(INDENT, docvec![line(), "~{", entries, "}~"]),
    ]
}

Key differences:

Returns a Document instead of mutating self.output
No manual indent += 1 / indent -= 1 — nest() handles it declaratively
The result is composable — can be embedded in a larger document
Can be unit tested: assert_eq!(generate_method_table(&methods).to_string(), "...")

Note on mutable generator state: CoreErlangGenerator holds mutable state beyond the output buffer — var_context (variable scoping/fresh names) and state_threading (field assignment state variables). Functions that use these will still require &mut self even after adopting Document return types. The migration decouples output construction from state mutation but does not eliminate all &mut self methods. The write_document() bridge (Phase 0) accommodates this: Document-returning functions can still take &mut self when they need to generate fresh variable names.

Document Type

A Document enum for Core Erlang generation (based on Gleam's approach, ~250 lines):

pub enum Document<'a> {
    /// A string literal
    Str(&'a str),
    /// An owned string
    String(String),
    /// A newline followed by current indentation
    Line,
    /// Increase indentation for nested content
    Nest(isize, Box<Document<'a>>),
    /// A sequence of documents
    Vec(Vec<Document<'a>>),
    /// A group that can be rendered flat or broken across lines
    Group(Box<Document<'a>>),
    /// A break point — rendered as a space when flat, newline when broken
    Break(&'a str),
    /// Empty document
    Nil,
}

While Core Erlang mostly has fixed formatting, Group and Break are included from the start because pattern matching compilation (planned soon) will generate deeply nested case expressions where readable line-breaking is needed. Including these variants now (~50 extra lines) avoids a disruptive retrofit later and follows Gleam's proven design.

Approach: Roll Our Own vs. Use a Crate

Recommended: Roll a focused implementation (~250 lines), following Gleam's proven design.

Option	Pros	Cons
`pretty` crate	Full Wadler-Lindig, well-tested	Adds dependency; includes features we don't need (`ForceBroken`, `FlexBreak`)
Gleam-style focused	What we need including `Group`/`Break`, easy to understand	Must write ~250 lines
Full Wadler-Lindig custom	Future-proof for formatting tools	Unnecessary complexity

We need: Str, String, Line, Nest, Vec, Group, Break, Nil, and a docvec! macro.

Prior Art

Compiler	Language	Approach	Notes
Gleam	Rust → Erlang/JS	Custom `Document` tree (~875 lines)	Wadler-Lindig with `docvec!` macro; shared across Erlang + JS backends
rustc	Rust → LLVM	`rustc_ast_pretty` module	Document tree for AST pretty-printing
prettyplease	Rust syn → Rust	Wadler-style documents	Formats generated Rust code
Elm compiler	Haskell → JS	Wadler pretty-printer	Standard in Haskell ecosystem
PureScript	Haskell → JS	`Doc` type with `render`	Composable document fragments
OCaml compiler	OCaml → native	`Format` module	Built-in pretty-printing with boxes

Universal pattern: Compilers that emit text-based output use document trees. Direct string concatenation is the exception, not the norm.

Gleam's evolution: Gleam started with simpler codegen and grew into the Document approach as complexity increased. Beamtalk is at a similar inflection point — 1,100+ write calls across 28 files.

Why not a full Core Erlang IR? The prior art compilers listed above all use document trees rather than typed target-language IRs for their text backends. A typed Core Erlang IR (Alternative 2, below) would only become valuable if beamtalk needed to transform or optimize the generated Core Erlang before emission — which it currently doesn't, since erlc handles all optimization passes.

User Impact

This is a purely internal refactoring — it changes how the compiler generates Core Erlang, not what it generates. No user-facing behavior changes.

Persona	Impact
Newcomer	None — same REPL, same error messages, same compiled output
Smalltalk developer	None — language semantics and syntax unchanged
Erlang/BEAM developer	None — generated Core Erlang is byte-for-byte identical
Operator	None — no runtime impact; fewer codegen bugs means fewer bad BEAM files in production
Compiler contributor	Significant improvement — codegen is easier to read, write, test, and refactor

Contributor Experience (Primary Beneficiary)

Before: Adding a new codegen feature requires:

Understanding the self.output mutation flow across multiple files
Manually tracking indentation state
Writing snapshot tests that compare entire generated files
Risk of indentation bugs that produce valid Rust but invalid Core Erlang

After: Adding a new codegen feature requires:

Writing a function that returns a Document
Composing it with existing document fragments using docvec!
Unit testing the fragment in isolation
Indentation is handled declaratively

Steelman Analysis

The Strongest Argument Against This ADR

This is a pure refactoring of the largest subsystem in a ~40k-line compiler that has multiple active epics of unimplemented language features. The refactoring produces zero user-visible value. Every hour spent migrating write! calls is an hour not spent on pattern matching, type inference, or the features that will determine whether Beamtalk has users.

This is a legitimate concern. The ADR proceeds despite it because:

The migration is designed to be organic, not dedicated — new code uses Document, old code migrates opportunistically during feature work
Codegen is the subsystem that every language feature touches — improving its architecture reduces the cost of all future features
The Document type itself is ~250 lines of net-new code; the migration cost is spread across feature PRs, not front-loaded

Option A: Document Tree (Recommended)

🧑‍💻 Newcomer contributor: "I can understand docvec!['init'/1 = fun () ->', nest(4, body)] immediately — it reads like the Core Erlang it produces. I don't need to trace self.indent mutations across files."
🎩 Smalltalk developer: "Smalltalk compilers use composable IR representations internally. A document tree is the Smalltalk way — objects representing structure, rendered lazily."
⚙️ BEAM veteran: "The generated Core Erlang is identical either way. I don't care how the compiler builds the string internally, as long as the output is correct."
🏭 Operator: "No runtime impact. But fewer codegen bugs means fewer bad BEAM files in production."
🎨 Language designer: "This is the right abstraction level. The codegen should express what Core Erlang to produce, not how to concatenate strings."

Option B: Keep `write!` (Status Quo)

🧑‍💻 Newcomer contributor: "I already know write! from Rust — no new concepts to learn. The pattern is simple even if verbose."
🎩 Smalltalk developer: "Don't fix what isn't broken. Ship language features, not infrastructure."
⚙️ BEAM veteran: "Same argument — the output is what matters. Don't touch what's working."
🏭 Operator: "Any refactoring risks regressions. The current code works and has 196 snapshot tests validating it."
🎨 Language designer: "The write! approach is battle-tested in this codebase. The refactoring competes with feature work for attention — it must earn its priority."

Option C: Helper Methods Only (No Document Tree)

🧑‍💻 Newcomer contributor: "indented() closures are idiomatic Rust. No new abstraction to learn."
🎩 Smalltalk developer: "Incremental improvement. Ship small fixes now, rethink architecture later."
⚙️ BEAM veteran: "Centralizing module names into helpers solves the real pain point without touching 1,100 call sites."
🏭 Operator: "Zero-risk change — each helper can be adopted one call site at a time."
🎨 Language designer: "This is 80% of the value at 20% of the cost. But it doesn't solve composability or fragment testing."

Tension Points

Risk vs. maintainability: The status quo has zero risk today but increasing maintenance cost as features grow. The document tree has one-time migration risk but reduces ongoing cost. Helper methods sit in the middle — low risk, moderate improvement.
Familiarity: write! is standard Rust; docvec! is a custom macro. But docvec! is learnable in minutes and the pattern is well-documented (Gleam, Wadler-Lindig papers).
Timing: Multiple active epics will change codegen substantially. Doing a dedicated migration now risks conflicting with feature PRs. Organic migration avoids this by folding migration into feature work.
Opportunity cost: Time spent on infrastructure vs. features. Mitigated by organic migration — the Document type itself is ~250 lines; no dedicated migration phases compete with feature work.

Alternatives Considered

1. Helper Methods on CoreErlangGenerator (80/20 Solution)

Extract helper methods that encapsulate common patterns without introducing a new intermediate representation:

impl CoreErlangGenerator {
    fn indented(&mut self, body: impl FnOnce(&mut Self) -> Result<()>) -> Result<()> {
        self.indent += 1;
        body(self)?;
        self.indent -= 1;
        Ok(())
    }

    fn emit_call(&mut self, module: &str, function: &str, args: &[&str]) -> Result<()> { ... }
    fn emit_let(&mut self, var: &str, body: impl FnOnce(&mut Self) -> Result<()>) -> Result<()> { ... }
}

Partially adopted: indented() and targeted helpers should be introduced regardless — they provide immediate value at zero risk and can be adopted one call site at a time. However, helpers alone don't solve composability (fragments can't be returned, stored, or tested independently) or the fundamental problem of interleaved mutation. Helper methods are a stepping stone, not the destination.

2. Template Engine (Tera, Askama)

Use a template engine with Core Erlang templates containing placeholders.

// hypothetical template
module '{{ module_name }}' [{{ exports }}]
  attributes [{{ attributes }}]

{% for function in functions %}
'{{ function.name }}'/{{ function.arity }} = fun ({{ function.params }}) ->
    {{ function.body }}
{% endfor %}

Rejected because:

Core Erlang's nested expression structure doesn't map well to flat templates
Recursive expression generation (message sends inside message sends) requires programmatic construction
Template debugging is harder than Rust code debugging
Would require a new dependency and template language knowledge

3. Typed Core Erlang IR

Build a full typed intermediate representation of Core Erlang:

enum CoreExpr {
    Let { var: String, value: Box<CoreExpr>, body: Box<CoreExpr> },
    Apply { fun: Box<CoreExpr>, args: Vec<CoreExpr> },
    Case { expr: Box<CoreExpr>, clauses: Vec<CoreClause> },
    Map { pairs: Vec<(CoreExpr, CoreExpr)> },
    Literal(CoreLiteral),
    // ...
}

Rejected because:

Significantly more work than a document tree (~2,000+ lines for a full Core Erlang AST)
Over-engineered for our needs — we don't transform or optimize the IR, we just emit it
The beamtalk AST → Core Erlang AST mapping would be a large rewrite
A document tree provides 90% of the benefit at 10% of the cost

A typed IR may become valuable later (for optimization passes, multiple backends), but it's premature now. The document tree doesn't preclude adding an IR later — they serve different purposes.

4. Use the `pretty` Crate

Use the existing pretty crate from crates.io instead of rolling our own.

Not adopted because:

The pretty crate includes features beyond what we need (ForceBroken, FlexBreak, Annotated with generic metadata) — adding API surface and dependency weight we won't use
A focused custom implementation (~250 lines) with Group/Break is simpler, has zero dependencies, and is tailored to our needs
Gleam took the same approach (custom ~875 line implementation) for similar reasons
Owning the implementation lets us add Core Erlang-specific helpers (e.g., core_atom(), core_binary()) directly on the Document type

5. Incremental Adoption — `write!` + Document Hybrid

Keep write! for existing code, use Document only for new code.

Partially adopted: The migration strategy (see Implementation) is incremental. But the end goal is full migration — a permanent hybrid would be confusing for contributors who must learn both patterns.

6. Builder Pattern

Use a builder API that wraps String construction with composable methods:

CoreErlangBuilder::new()
    .line("'method_table'/0 = fun () ->")
    .indent(|b| {
        b.text("~{")
         .join(methods.iter(), ", ", |(name, arity)| format!("'{name}' => {arity}"))
         .text("}~")
    })
    .build()  // → String

Not adopted because:

Still fundamentally imperative — closures execute in order, so fragment reuse requires Clone or re-execution
The indent() closure approach is essentially Alternative 1 (helper methods) with a wrapper type — it doesn't enable building fragments independently and composing them later
No standard rendering pipeline — each builder call immediately appends to the internal buffer, so you can't inspect or transform the structure before rendering
The Document algebraic type is the industry-standard approach (Gleam, rustc, prettyplease) and enables Group/Break line-wrapping decisions that a builder can't express declaratively

What it does better than write!: Eliminates manual indent += 1 / indent -= 1 pairs, so it solves Problem #1 (fragile indentation). If composability and fragment testing are not priorities, a builder is a pragmatic middle ground.

Consequences

Positive

Easier refactoring: Module name changes, dispatch pattern changes, and new language features require changing document constructors, not hunting through write! calls
Declarative indentation: nest() eliminates manual indent += 1 / indent -= 1 pairs and the bugs they cause
Composable fragments: Code generation functions return Document values that can be composed, tested, and reused
Unit testable: Individual codegen functions can be tested without running the full pipeline
Readable codegen: docvec! expressions read like the Core Erlang they produce
AI-agent friendly: Structured document construction is easier for AI agents to modify correctly than scattered string mutations

Negative

Migration effort: ~1,100 write! call sites across 28 files need conversion over time (mitigated: organic migration during feature work, not a dedicated project)
New concept: Contributors must learn the Document type and docvec! macro (mitigated: simple API, well-documented pattern)
Regression risk: Any large refactoring risks introducing bugs (mitigated: 196 snapshot tests catch output changes)
Test migration: 170 codegen unit tests will need updating as each file migrates to return Document values
Hybrid-state confusion: During the organic migration period, some functions return Document while others write to self.output — contributors must understand both patterns (mitigated: clear documentation of which pattern to use where; new code always uses Document)
No user-visible benefit: This is pure internal improvement — it competes with feature work for attention

Neutral

Generated output unchanged: Byte-for-byte identical Core Erlang output (verified by snapshot tests)
Performance: Document tree adds one allocation + render pass. Negligible compared to erlc compilation time
Future backends: If Beamtalk ever adds an Erlang source backend (ADR 0003 leaves this open), the Document type would be reusable
Source maps: If source-level debugging is added later, the document tree enables it more naturally than write! — an Annotated(Span, Box<Document>) variant can carry source positions through construction, and the renderer emits Core Erlang line annotations centrally. The current write! approach would require manually inserting line annotations across 28 files. Gleam demonstrates this pattern with -file() annotations emitted via docvec!

Implementation

Adoption Strategy: New Code First, Organic Migration

Rather than a dedicated multi-phase migration project that competes with feature work, adopt the document tree organically:

Phase 0: Foundation (~S — single PR)

Add Document enum (with Group/Break variants) and docvec! macro in crates/beamtalk-core/src/codegen/core_erlang/document.rs
Implement to_string() / render() for the document type, including group/break rendering logic
Add unit tests for the document primitives
Add a write_document() bridge method to CoreErlangGenerator that renders a Document to self.output — enabling gradual per-function migration

Phase 1: New Code Convention

All new codegen functions return Document instead of writing to self.output
write_document() bridge allows new Document-returning functions to coexist with old write! code
No dedicated migration of existing code

Phase 2: Opportunistic Migration (Ongoing)

When touching a file for a feature (e.g., pattern matching, metaclasses, new control flow), migrate that file's functions to return Document
Each feature PR naturally migrates the functions it modifies
196 snapshot tests verify byte-for-byte identical output at each step

Phase 3: Cleanup (When Migration Naturally Completes)

Remove self.output: String and self.indent: usize from CoreErlangGenerator
Change remaining CoreErlangGenerator methods from &mut self → Result<()> to &self → Document
Update mod.rs entry point to build document tree and render once

Approximate migration order (based on which subsystems feature work will touch first):

Subsystem	Files	`write!` calls	Likely Feature Trigger
Control flow	5 files	249	Block semantics (BT-204)
Gen server	5 files	293	Actor runtime (BT-207)
Module generation	5 files	367	Metaclasses (BT-319)
Expressions + intrinsics	2 files	146	Stdlib (BT-205)
Leaf functions	5 files	45	Various

Verification: Run just ci after every migration — snapshot tests ensure output is identical.

Key advantage: No dedicated migration phases compete with feature work. The document tree earns its keep by making each feature PR's codegen changes cleaner and more testable.

Migration Health Checks

To prevent the organic migration from stalling indefinitely:

Tracking: After Phase 0, count remaining write!/writeln! calls with grep -c 'write!\|writeln!' crates/beamtalk-core/src/codegen/core_erlang/**/*.rs. Current baseline: ~1,200 calls.
Checkpoint: If after 6 months of feature work, fewer than 10% of call sites have migrated, revisit whether the organic strategy is working or a dedicated migration sprint is warranted.
Phase 3 trigger: When ≤50 write! calls remain (i.e., the long tail of rarely-touched files), schedule a dedicated cleanup PR to finish the migration and remove the self.output buffer.

Migration Path

Not applicable — this is an internal refactoring with no user-facing changes.

References

Related ADR: ADR 0003 — Keep Core Erlang as Primary Code Generation Target
Related ADR: ADR 0007 — Compilable Standard Library with Primitive Injection (class kind routing maintained by document tree migration)
Related ADR: ADR 0014 — Beamtalk Test Framework (196 codegen snapshot tests serve as verification)
Gleam's Document implementation: compiler-core/src/pretty.rs (~875 lines)
Gleam's Erlang codegen using documents: compiler-core/src/erlang/ (uses docvec! throughout)
Wadler-Lindig paper: "Strictly Pretty" (2000) by Christian Lindig
pretty crate: docs.rs/pretty
prettyplease crate: github.com/dtolnay/prettyplease