ADR 0018: Document Tree Code Generation (Wadler-Lindig Pretty Printer)
Status
Implemented (2026-02-15)
Context
Beamtalk generates Core Erlang text (ADR 0003) via direct string emission — 1,100+ write!/writeln! macro calls across 28 files in crates/beamtalk-core/src/codegen/core_erlang/. The current architecture uses a single String output buffer with manual indentation tracking:
// Current approach: imperative string building
pub(super) struct CoreErlangGenerator {
output: String, // Direct string buffer
indent: usize, // Manual indentation counter
// ...
}
// Example: generating a method table
writeln!(self.output, "'method_table'/0 = fun () ->")?;
self.indent += 1;
self.write_indent()?;
write!(self.output, "~{{")?;
for (i, (name, arity)) in methods.iter().enumerate() {
if i > 0 { write!(self.output, ", ")?; }
write!(self.output, "'{name}' => {arity}")?;
}
writeln!(self.output, "}}~")?;
self.indent -= 1;
Problems with Direct String Emission
1. Indentation is fragile. The self.indent counter must be manually incremented/decremented in matched pairs. Forgetting a decrement or nesting incorrectly produces malformed Core Erlang that erlc rejects — but the Rust code compiles fine. This class of bug is invisible at compile time.
2. No composability. Code fragments can't be built independently and combined. Every write! call mutates a shared String buffer, so you can't build a function body, inspect it, test it in isolation, or compose it with other fragments. Adding new codegen features requires understanding the mutation flow across multiple files. In practice, AI agents working on codegen issues struggle with the dense write!/writeln! + manual indent patterns — the code is hard to read and reason through, increasing the risk of subtle indentation errors that compile as valid Rust but produce invalid Core Erlang.
3. Testing requires full string comparison. The codegen subsystem has 196 snapshot tests and 170 unit tests. There's no way to unit-test individual code generation fragments (e.g., "does this method table generate correctly?") without running the full pipeline.
4. The codebase is large and growing. 1,100+ write!/writeln! calls across 28 files, with the heaviest files being:
| File | write! calls | Purpose |
|---|---|---|
primitive_implementations.rs | 136 | Intrinsic method bodies |
counted_loops.rs | 126 | Loop codegen |
gen_server/methods.rs | 98 | Method dispatch |
value_type_codegen.rs | 204 | Value type modules |
intrinsics.rs | 80 | Intrinsic dispatch |
As the language grows (pattern matching, exception handling, type annotations), this approach will become increasingly difficult to maintain.
Note: Some specific refactoring tasks (e.g., module renaming) can be addressed independently by extending existing abstractions like ModuleName in erlang_types.rs. The document tree is a long-term architectural improvement for the codegen subsystem as a whole.
What Other Rust-Based Compilers Do
Gleam (Rust → Erlang/JS) uses a Document algebraic data type based on Wadler-Lindig's "Strictly Pretty" algorithm. Code generation builds a tree of document nodes, then renders once:
// Gleam's approach — declarative document composition
let module = docvec![
header,
"-compile([no_auto_import, nowarn_unused_vars]).",
line(),
exports,
join(statements, lines(2)),
];
module.to_pretty_string(80)
Gleam's Document enum has variants: Str, Line, Nest, Group, Vec, Break. The docvec! macro provides ergonomic composition. Gleam uses this for both Erlang and JavaScript backends — same Document type, different rendering. Gleam's implementation is ~875 lines including tests.
Other compilers:
- rustc uses
rustc_ast_pretty(Wadler-style document trees) for AST pretty-printing - prettyplease (Rust ecosystem) uses Wadler-style documents for
synAST formatting - SWC (Rust → JS) uses structured AST → document rendering
- The
prettycrate on crates.io provides a ready-made Wadler-Lindig implementation
Pattern: Modern Rust compilers overwhelmingly use document trees for text code generation, not direct string emission.
Decision
Replace direct write!/writeln! string emission with a Wadler-Lindig document tree for Core Erlang code generation.
Introduce a Document enum (or use the pretty crate) and a docvec! macro for composing Core Erlang output. The code generator builds a tree of Document nodes, which is rendered to a string in a final pass.
What This Looks Like
Before (current):
fn generate_method_table(&mut self, methods: &[(String, usize)]) -> Result<()> {
writeln!(self.output, "'method_table'/0 = fun () ->")?;
self.indent += 1;
self.write_indent()?;
write!(self.output, "~{{")?;
for (i, (name, arity)) in methods.iter().enumerate() {
if i > 0 { write!(self.output, ", ")?; }
write!(self.output, "'{name}' => {arity}")?;
}
writeln!(self.output, "}}~")?;
self.indent -= 1;
Ok(())
}
After (document tree):
fn generate_method_table(&self, methods: &[(String, usize)]) -> Document {
let entries = join(
methods.iter().map(|(name, arity)| docvec!["'", name, "' => ", arity]),
", "
);
docvec![
"'method_table'/0 = fun () ->",
nest(INDENT, docvec![line(), "~{", entries, "}~"]),
]
}
Key differences:
- Returns a
Documentinstead of mutatingself.output - No manual
indent += 1/indent -= 1—nest()handles it declaratively - The result is composable — can be embedded in a larger document
- Can be unit tested:
assert_eq!(generate_method_table(&methods).to_string(), "...")
Note on mutable generator state: CoreErlangGenerator holds mutable state beyond the output buffer — var_context (variable scoping/fresh names) and state_threading (field assignment state variables). Functions that use these will still require &mut self even after adopting Document return types. The migration decouples output construction from state mutation but does not eliminate all &mut self methods. The write_document() bridge (Phase 0) accommodates this: Document-returning functions can still take &mut self when they need to generate fresh variable names.
Document Type
A Document enum for Core Erlang generation (based on Gleam's approach, ~250 lines):
pub enum Document<'a> {
/// A string literal
Str(&'a str),
/// An owned string
String(String),
/// A newline followed by current indentation
Line,
/// Increase indentation for nested content
Nest(isize, Box<Document<'a>>),
/// A sequence of documents
Vec(Vec<Document<'a>>),
/// A group that can be rendered flat or broken across lines
Group(Box<Document<'a>>),
/// A break point — rendered as a space when flat, newline when broken
Break(&'a str),
/// Empty document
Nil,
}
While Core Erlang mostly has fixed formatting, Group and Break are included from the start because pattern matching compilation (planned soon) will generate deeply nested case expressions where readable line-breaking is needed. Including these variants now (~50 extra lines) avoids a disruptive retrofit later and follows Gleam's proven design.
Approach: Roll Our Own vs. Use a Crate
Recommended: Roll a focused implementation (~250 lines), following Gleam's proven design.
| Option | Pros | Cons |
|---|---|---|
pretty crate | Full Wadler-Lindig, well-tested | Adds dependency; includes features we don't need (ForceBroken, FlexBreak) |
| Gleam-style focused | What we need including Group/Break, easy to understand | Must write ~250 lines |
| Full Wadler-Lindig custom | Future-proof for formatting tools | Unnecessary complexity |
We need: Str, String, Line, Nest, Vec, Group, Break, Nil, and a docvec! macro.
Prior Art
| Compiler | Language | Approach | Notes |
|---|---|---|---|
| Gleam | Rust → Erlang/JS | Custom Document tree (~875 lines) | Wadler-Lindig with docvec! macro; shared across Erlang + JS backends |
| rustc | Rust → LLVM | rustc_ast_pretty module | Document tree for AST pretty-printing |
| prettyplease | Rust syn → Rust | Wadler-style documents | Formats generated Rust code |
| Elm compiler | Haskell → JS | Wadler pretty-printer | Standard in Haskell ecosystem |
| PureScript | Haskell → JS | Doc type with render | Composable document fragments |
| OCaml compiler | OCaml → native | Format module | Built-in pretty-printing with boxes |
Universal pattern: Compilers that emit text-based output use document trees. Direct string concatenation is the exception, not the norm.
Gleam's evolution: Gleam started with simpler codegen and grew into the Document approach as complexity increased. Beamtalk is at a similar inflection point — 1,100+ write calls across 28 files.
Why not a full Core Erlang IR? The prior art compilers listed above all use document trees rather than typed target-language IRs for their text backends. A typed Core Erlang IR (Alternative 2, below) would only become valuable if beamtalk needed to transform or optimize the generated Core Erlang before emission — which it currently doesn't, since erlc handles all optimization passes.
User Impact
This is a purely internal refactoring — it changes how the compiler generates Core Erlang, not what it generates. No user-facing behavior changes.
| Persona | Impact |
|---|---|
| Newcomer | None — same REPL, same error messages, same compiled output |
| Smalltalk developer | None — language semantics and syntax unchanged |
| Erlang/BEAM developer | None — generated Core Erlang is byte-for-byte identical |
| Operator | None — no runtime impact; fewer codegen bugs means fewer bad BEAM files in production |
| Compiler contributor | Significant improvement — codegen is easier to read, write, test, and refactor |
Contributor Experience (Primary Beneficiary)
Before: Adding a new codegen feature requires:
- Understanding the
self.outputmutation flow across multiple files - Manually tracking indentation state
- Writing snapshot tests that compare entire generated files
- Risk of indentation bugs that produce valid Rust but invalid Core Erlang
After: Adding a new codegen feature requires:
- Writing a function that returns a
Document - Composing it with existing document fragments using
docvec! - Unit testing the fragment in isolation
- Indentation is handled declaratively
Steelman Analysis
The Strongest Argument Against This ADR
This is a pure refactoring of the largest subsystem in a ~40k-line compiler that has multiple active epics of unimplemented language features. The refactoring produces zero user-visible value. Every hour spent migrating write! calls is an hour not spent on pattern matching, type inference, or the features that will determine whether Beamtalk has users.
This is a legitimate concern. The ADR proceeds despite it because:
- The migration is designed to be organic, not dedicated — new code uses
Document, old code migrates opportunistically during feature work - Codegen is the subsystem that every language feature touches — improving its architecture reduces the cost of all future features
- The
Documenttype itself is ~250 lines of net-new code; the migration cost is spread across feature PRs, not front-loaded
Option A: Document Tree (Recommended)
- 🧑💻 Newcomer contributor: "I can understand
docvec!['init'/1 = fun () ->', nest(4, body)]immediately — it reads like the Core Erlang it produces. I don't need to traceself.indentmutations across files." - 🎩 Smalltalk developer: "Smalltalk compilers use composable IR representations internally. A document tree is the Smalltalk way — objects representing structure, rendered lazily."
- ⚙️ BEAM veteran: "The generated Core Erlang is identical either way. I don't care how the compiler builds the string internally, as long as the output is correct."
- 🏭 Operator: "No runtime impact. But fewer codegen bugs means fewer bad BEAM files in production."
- 🎨 Language designer: "This is the right abstraction level. The codegen should express what Core Erlang to produce, not how to concatenate strings."
Option B: Keep write! (Status Quo)
- 🧑💻 Newcomer contributor: "I already know
write!from Rust — no new concepts to learn. The pattern is simple even if verbose." - 🎩 Smalltalk developer: "Don't fix what isn't broken. Ship language features, not infrastructure."
- ⚙️ BEAM veteran: "Same argument — the output is what matters. Don't touch what's working."
- 🏭 Operator: "Any refactoring risks regressions. The current code works and has 196 snapshot tests validating it."
- 🎨 Language designer: "The
write!approach is battle-tested in this codebase. The refactoring competes with feature work for attention — it must earn its priority."
Option C: Helper Methods Only (No Document Tree)
- 🧑💻 Newcomer contributor: "
indented()closures are idiomatic Rust. No new abstraction to learn." - 🎩 Smalltalk developer: "Incremental improvement. Ship small fixes now, rethink architecture later."
- ⚙️ BEAM veteran: "Centralizing module names into helpers solves the real pain point without touching 1,100 call sites."
- 🏭 Operator: "Zero-risk change — each helper can be adopted one call site at a time."
- 🎨 Language designer: "This is 80% of the value at 20% of the cost. But it doesn't solve composability or fragment testing."
Tension Points
- Risk vs. maintainability: The status quo has zero risk today but increasing maintenance cost as features grow. The document tree has one-time migration risk but reduces ongoing cost. Helper methods sit in the middle — low risk, moderate improvement.
- Familiarity:
write!is standard Rust;docvec!is a custom macro. Butdocvec!is learnable in minutes and the pattern is well-documented (Gleam, Wadler-Lindig papers). - Timing: Multiple active epics will change codegen substantially. Doing a dedicated migration now risks conflicting with feature PRs. Organic migration avoids this by folding migration into feature work.
- Opportunity cost: Time spent on infrastructure vs. features. Mitigated by organic migration — the
Documenttype itself is ~250 lines; no dedicated migration phases compete with feature work.
Alternatives Considered
1. Helper Methods on CoreErlangGenerator (80/20 Solution)
Extract helper methods that encapsulate common patterns without introducing a new intermediate representation:
impl CoreErlangGenerator {
fn indented(&mut self, body: impl FnOnce(&mut Self) -> Result<()>) -> Result<()> {
self.indent += 1;
body(self)?;
self.indent -= 1;
Ok(())
}
fn emit_call(&mut self, module: &str, function: &str, args: &[&str]) -> Result<()> { ... }
fn emit_let(&mut self, var: &str, body: impl FnOnce(&mut Self) -> Result<()>) -> Result<()> { ... }
}
Partially adopted: indented() and targeted helpers should be introduced regardless — they provide immediate value at zero risk and can be adopted one call site at a time. However, helpers alone don't solve composability (fragments can't be returned, stored, or tested independently) or the fundamental problem of interleaved mutation. Helper methods are a stepping stone, not the destination.
2. Template Engine (Tera, Askama)
Use a template engine with Core Erlang templates containing placeholders.
// hypothetical template
module '{{ module_name }}' [{{ exports }}]
attributes [{{ attributes }}]
{% for function in functions %}
'{{ function.name }}'/{{ function.arity }} = fun ({{ function.params }}) ->
{{ function.body }}
{% endfor %}
Rejected because:
- Core Erlang's nested expression structure doesn't map well to flat templates
- Recursive expression generation (message sends inside message sends) requires programmatic construction
- Template debugging is harder than Rust code debugging
- Would require a new dependency and template language knowledge
3. Typed Core Erlang IR
Build a full typed intermediate representation of Core Erlang:
enum CoreExpr {
Let { var: String, value: Box<CoreExpr>, body: Box<CoreExpr> },
Apply { fun: Box<CoreExpr>, args: Vec<CoreExpr> },
Case { expr: Box<CoreExpr>, clauses: Vec<CoreClause> },
Map { pairs: Vec<(CoreExpr, CoreExpr)> },
Literal(CoreLiteral),
// ...
}
Rejected because:
- Significantly more work than a document tree (~2,000+ lines for a full Core Erlang AST)
- Over-engineered for our needs — we don't transform or optimize the IR, we just emit it
- The beamtalk AST → Core Erlang AST mapping would be a large rewrite
- A document tree provides 90% of the benefit at 10% of the cost
A typed IR may become valuable later (for optimization passes, multiple backends), but it's premature now. The document tree doesn't preclude adding an IR later — they serve different purposes.
4. Use the pretty Crate
Use the existing pretty crate from crates.io instead of rolling our own.
Not adopted because:
- The
prettycrate includes features beyond what we need (ForceBroken,FlexBreak,Annotatedwith generic metadata) — adding API surface and dependency weight we won't use - A focused custom implementation (~250 lines) with
Group/Breakis simpler, has zero dependencies, and is tailored to our needs - Gleam took the same approach (custom ~875 line implementation) for similar reasons
- Owning the implementation lets us add Core Erlang-specific helpers (e.g.,
core_atom(),core_binary()) directly on theDocumenttype
5. Incremental Adoption — write! + Document Hybrid
Keep write! for existing code, use Document only for new code.
Partially adopted: The migration strategy (see Implementation) is incremental. But the end goal is full migration — a permanent hybrid would be confusing for contributors who must learn both patterns.
6. Builder Pattern
Use a builder API that wraps String construction with composable methods:
CoreErlangBuilder::new()
.line("'method_table'/0 = fun () ->")
.indent(|b| {
b.text("~{")
.join(methods.iter(), ", ", |(name, arity)| format!("'{name}' => {arity}"))
.text("}~")
})
.build() // → String
Not adopted because:
- Still fundamentally imperative — closures execute in order, so fragment reuse requires
Cloneor re-execution - The
indent()closure approach is essentially Alternative 1 (helper methods) with a wrapper type — it doesn't enable building fragments independently and composing them later - No standard rendering pipeline — each builder call immediately appends to the internal buffer, so you can't inspect or transform the structure before rendering
- The
Documentalgebraic type is the industry-standard approach (Gleam, rustc, prettyplease) and enablesGroup/Breakline-wrapping decisions that a builder can't express declaratively
What it does better than write!: Eliminates manual indent += 1 / indent -= 1 pairs, so it solves Problem #1 (fragile indentation). If composability and fragment testing are not priorities, a builder is a pragmatic middle ground.
Consequences
Positive
- Easier refactoring: Module name changes, dispatch pattern changes, and new language features require changing document constructors, not hunting through
write!calls - Declarative indentation:
nest()eliminates manualindent += 1/indent -= 1pairs and the bugs they cause - Composable fragments: Code generation functions return
Documentvalues that can be composed, tested, and reused - Unit testable: Individual codegen functions can be tested without running the full pipeline
- Readable codegen:
docvec!expressions read like the Core Erlang they produce - AI-agent friendly: Structured document construction is easier for AI agents to modify correctly than scattered string mutations
Negative
- Migration effort: ~1,100
write!call sites across 28 files need conversion over time (mitigated: organic migration during feature work, not a dedicated project) - New concept: Contributors must learn the
Documenttype anddocvec!macro (mitigated: simple API, well-documented pattern) - Regression risk: Any large refactoring risks introducing bugs (mitigated: 196 snapshot tests catch output changes)
- Test migration: 170 codegen unit tests will need updating as each file migrates to return
Documentvalues - Hybrid-state confusion: During the organic migration period, some functions return
Documentwhile others write toself.output— contributors must understand both patterns (mitigated: clear documentation of which pattern to use where; new code always usesDocument) - No user-visible benefit: This is pure internal improvement — it competes with feature work for attention
Neutral
- Generated output unchanged: Byte-for-byte identical Core Erlang output (verified by snapshot tests)
- Performance: Document tree adds one allocation + render pass. Negligible compared to
erlccompilation time - Future backends: If Beamtalk ever adds an Erlang source backend (ADR 0003 leaves this open), the
Documenttype would be reusable - Source maps: If source-level debugging is added later, the document tree enables it more naturally than
write!— anAnnotated(Span, Box<Document>)variant can carry source positions through construction, and the renderer emits Core Erlang line annotations centrally. The currentwrite!approach would require manually inserting line annotations across 28 files. Gleam demonstrates this pattern with-file()annotations emitted viadocvec!
Implementation
Adoption Strategy: New Code First, Organic Migration
Rather than a dedicated multi-phase migration project that competes with feature work, adopt the document tree organically:
Phase 0: Foundation (~S — single PR)
- Add
Documentenum (withGroup/Breakvariants) anddocvec!macro incrates/beamtalk-core/src/codegen/core_erlang/document.rs - Implement
to_string()/render()for the document type, including group/break rendering logic - Add unit tests for the document primitives
- Add a
write_document()bridge method toCoreErlangGeneratorthat renders aDocumenttoself.output— enabling gradual per-function migration
Phase 1: New Code Convention
- All new codegen functions return
Documentinstead of writing toself.output write_document()bridge allows newDocument-returning functions to coexist with oldwrite!code- No dedicated migration of existing code
Phase 2: Opportunistic Migration (Ongoing)
- When touching a file for a feature (e.g., pattern matching, metaclasses, new control flow), migrate that file's functions to return
Document - Each feature PR naturally migrates the functions it modifies
- 196 snapshot tests verify byte-for-byte identical output at each step
Phase 3: Cleanup (When Migration Naturally Completes)
- Remove
self.output: Stringandself.indent: usizefromCoreErlangGenerator - Change remaining
CoreErlangGeneratormethods from&mut self → Result<()>to&self → Document - Update
mod.rsentry point to build document tree and render once
Approximate migration order (based on which subsystems feature work will touch first):
| Subsystem | Files | write! calls | Likely Feature Trigger |
|---|---|---|---|
| Control flow | 5 files | 249 | Block semantics (BT-204) |
| Gen server | 5 files | 293 | Actor runtime (BT-207) |
| Module generation | 5 files | 367 | Metaclasses (BT-319) |
| Expressions + intrinsics | 2 files | 146 | Stdlib (BT-205) |
| Leaf functions | 5 files | 45 | Various |
Verification: Run just ci after every migration — snapshot tests ensure output is identical.
Key advantage: No dedicated migration phases compete with feature work. The document tree earns its keep by making each feature PR's codegen changes cleaner and more testable.
Migration Health Checks
To prevent the organic migration from stalling indefinitely:
- Tracking: After Phase 0, count remaining
write!/writeln!calls withgrep -c 'write!\|writeln!' crates/beamtalk-core/src/codegen/core_erlang/**/*.rs. Current baseline: ~1,200 calls. - Checkpoint: If after 6 months of feature work, fewer than 10% of call sites have migrated, revisit whether the organic strategy is working or a dedicated migration sprint is warranted.
- Phase 3 trigger: When ≤50
write!calls remain (i.e., the long tail of rarely-touched files), schedule a dedicated cleanup PR to finish the migration and remove theself.outputbuffer.
Migration Path
Not applicable — this is an internal refactoring with no user-facing changes.
References
- Related ADR: ADR 0003 — Keep Core Erlang as Primary Code Generation Target
- Related ADR: ADR 0007 — Compilable Standard Library with Primitive Injection (class kind routing maintained by document tree migration)
- Related ADR: ADR 0014 — Beamtalk Test Framework (196 codegen snapshot tests serve as verification)
- Gleam's Document implementation: compiler-core/src/pretty.rs (~875 lines)
- Gleam's Erlang codegen using documents: compiler-core/src/erlang/ (uses
docvec!throughout) - Wadler-Lindig paper: "Strictly Pretty" (2000) by Christian Lindig
prettycrate: docs.rs/prettyprettypleasecrate: github.com/dtolnay/prettyplease