ADR 0044: Comments as First-Class AST Nodes
Status
Accepted | Implemented 2026-02-28
Context
Current State
Comments in Beamtalk are currently handled as token-level trivia. The lexer captures
them correctly — the Trivia enum distinguishes LineComment, BlockComment, and
DocComment — but the parser discards them when building the AST, with one exception:
/// doc comments are lifted onto ClassDefinition and MethodDefinition nodes per
ADR 0008. All other comments are lost after parsing.
This means:
// Calculate compound interest using P * (1 + r/n)^(nt)
calculateInterest: principal rate: rate =>
principal * ((1 + (rate / 12)) raisedTo: 12)
After parsing, the comment is gone. The MethodDefinition node has no record it existed.
Why This Is Now a Problem
Three converging needs expose this gap:
1. Formatter / unparser. The lint framework (BT-951) and the lint violation cleanup
epic (BT-962) make it clear that Beamtalk needs a beamtalk fmt command. A formatter
must round-trip source files losslessly. If comments are not in the AST, the formatter
silently drops them — which is unacceptable.
2. Live tool persistence. A developer tool should allow classes to be built or
modified incrementally in a live environment and then persisted back to .bt files on
disk. Beamtalk is disk-backed (not image-based), so persistence means unparse-to-source.
A class modified in a live tool was originally written with comments; those comments
must survive the round-trip. Additionally, a class created in a live tool (no source
file origin, no token stream to recover from) needs a way to attach comments to
synthesized AST nodes.
3. Tooling == compiler. ADR 0024 established the principle that the compiler pipeline should be the language service — one parser, one AST, every tool consuming the same data. This is Anders Hejlsberg's key insight from TypeScript: the moment you have two parsers or two ASTs, they diverge and tooling becomes a maintenance burden. Comments being absent from the compiler's AST is a direct violation: a hypothetical formatter would need a separate parse to recover them, breaking the principle.
Constraints
.btfiles remain the source of truth on disk. This is not a full Smalltalk image.- The AST must support both parsed code (comments from tokens) and synthesized code (comments attached programmatically, no source positions available).
- The existing
doc_comment: Option<String>on classes and methods (ADR 0008) must be preserved and generalised, not replaced. - Comment handling must not complicate the codegen pipeline — the compiler ignores comments.
- Synthesized AST nodes (from live tools or auto-generated code like
with*:methods) have no source positions. Comment attachment must work without valid spans.
Decision
Comments are first-class data attached directly to AST nodes. Every comment belongs to exactly one AST node, either as a leading comment (appears before the node in source) or a trailing comment (appears at the end of the same line as the node). No separate comment collections exist at any scope.
The Core Invariant
Every comment belongs to exactly one AST node. No separate flat comment lists exist at module scope or any other scope.
The association rule:
- A comment that appears before a node → leading comment on that node
- A comment that appears at the end of the same line as a node → trailing comment on that node
- A comment after the last node in a file → trailing comment on the last node
- A comment before the first node in a file → leading comment on the first node
This rule is unambiguous and eliminates the dual-storage anti-pattern.
Doc Comment vs Regular Comment Deduplication
/// doc comments remain in doc_comment: Option<String> on ClassDefinition and
MethodDefinition as established by ADR 0008. They are not duplicated into
CommentAttachment.leading. The parser handles them in separate passes:
collect_doc_comment() extracts /// trivia into doc_comment;
collect_comment_attachment() extracts // and /* */ trivia into CommentAttachment.
This avoids Go's dual-storage anti-pattern: each comment type has exactly one home.
| Comment syntax | AST storage | Runtime storage | Established by |
|---|---|---|---|
/// doc comment | doc_comment: Option<String> | CompiledMethod.doc (object state) | ADR 0008 (AST field), ADR 0033 (runtime path) |
// line comment | CommentAttachment.leading or .trailing | Not compiled — formatter only | This ADR |
/* */ block comment | CommentAttachment.leading or .trailing | Not compiled — formatter only | This ADR |
Note: ADR 0033 superseded ADR 0008's EEP-48 doc chunk generation. /// comments now
compile to doc: message sends that populate CompiledMethod.doc at class load time,
rather than being embedded as BEAM file chunks. The doc_comment AST field remains the
correct vehicle for carrying this data through compilation. Regular // and /* */
comments have no runtime representation — they exist in the AST solely for the
formatter/unparser.
Synthesized AST Nodes
Comments on synthesized AST nodes (from live tools, auto-generated with*: methods, or
ClassBuilder codegen) use Span::default() for the span field. Consumers that inspect
comment spans (LSP, diagnostics) must treat Span::default() as "no source location" —
consistent with how synthesized Expression nodes already handle missing spans.
Section-Divider Comments
Comments that serve as section dividers between methods (e.g. // ---- Query Methods ----)
are not semantically associated with any single method. Under the "leading on following
node" rule, they attach as leading comments on the next method. This is imperfect — if
that method is deleted, the section comment disappears. However, this is an acceptable
trade-off: section dividers are rare in idiomatic Beamtalk, the formatter can emit them
faithfully during round-trips, and the alternative (free-floating comments) reintroduces
dual storage. A future refinement could add an explicit SectionComment node at the
class body level if this proves to be a practical problem.
AST Changes
New CommentAttachment type
/// Comments attached to an AST node.
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct CommentAttachment {
/// Comments appearing on lines immediately before this node.
/// Ordered top-to-bottom as they appear in source.
pub leading: Vec<Comment>,
/// A single end-of-line comment on the same line as this node.
pub trailing: Option<Comment>,
}
impl CommentAttachment {
pub fn is_empty(&self) -> bool {
self.leading.is_empty() && self.trailing.is_none()
}
}
Expression nodes carry comments
Rather than adding CommentAttachment to all 19 Expression variants, comments attach
at the statement sequence level — the granularity where comments naturally live in
source code. A comment between two expressions in a method body belongs to one of those
expressions, not to a sub-expression within them.
The ExpressionStatement wrapper carries the attachment:
/// An expression in a statement position, with optional surrounding comments.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ExpressionStatement {
pub comments: CommentAttachment,
pub expression: Expression,
}
Method bodies, block bodies, and module-level expression sequences all become
Vec<ExpressionStatement> instead of Vec<Expression>.
Note: comments inside block bodies (e.g. [:each | // Transform... \n each asUppercase])
are covered because Block.body is also Vec<ExpressionStatement>. Comments between
sub-expressions within a single expression (e.g. between arguments in a keyword send)
are not preserved by this design — they remain at statement granularity only. This
is a deliberate scope limitation: sub-expression comments are rare and the formatter can
handle them via whitespace preservation rather than AST attachment.
Class, method, and state declaration nodes
ClassDefinition and MethodDefinition already have doc_comment: Option<String> for
/// comments. They gain a full CommentAttachment for // and /* */ comments that
appear before the definition (separate from doc comments):
pub struct MethodDefinition {
pub comments: CommentAttachment, // // and /* */ above the method
pub doc_comment: Option<String>, // /// doc comment (existing, ADR 0008)
// ... existing fields unchanged ...
}
pub struct ClassDefinition {
pub comments: CommentAttachment, // // and /* */ above the class
pub doc_comment: Option<String>, // /// doc comment (existing, ADR 0008)
// ... existing fields unchanged ...
}
StateDeclaration gains both CommentAttachment (for ////* */) and
doc_comment: Option<String> (for ///). State fields commonly carry explanatory
comments; capturing /// at parse time is necessary so the formatter, LSP, and future
field reflection can all use the same data source:
pub struct StateDeclaration {
pub comments: CommentAttachment, // // and /* */ above the field
pub doc_comment: Option<String>, // /// doc comment (field-level)
// ... existing fields unchanged (name, type_annotation, default_value, span) ...
}
Runtime compilation of field doc comments is deferred. The /// on a state field
is collected by the parser and available throughout the AST pipeline (formatter, LSP,
semantic analysis), but the compiler does not yet emit anything for it. A follow-up ADR
covering FieldDescriptor objects will define the runtime storage and accessor API.
FieldDescriptor is the right model — consistent with Class and CompiledMethod
being first-class objects you introspect via message sends. A FieldDescriptor will
carry .name, .doc, .typeAnnotation, .defaultValue as messages, participate in
the metaclass tower, and be accessible via MyClass fields returning a collection of
FieldDescriptor instances held by the class object. This mirrors how MyClass methods
returns CompiledMethod instances from the class's method dictionary today. Both are
class-side objects — not per-instance data.
Type annotations will be stored on FieldDescriptor (and on CompiledMethod for
consistency) as inert metadata — accessible via reflection but never enforced at
runtime. This follows the Strongtalk model: type annotations are optional, the runtime
performs no type checks, but the annotations survive as data so live tools and
reflection APIs can read them without re-parsing source. The Elixir @spec precedent
on BEAM validates this: typespecs are compiled into module attributes, used by Dialyzer
and documentation tooling, ignored by the VM. A synthesized class built by a live tool
with no source file would otherwise have no way to expose type information.
The current MethodDefinition.return_type and ParameterDefinition.type_annotation
fields in the AST are discarded by codegen today. The FieldDescriptor ADR will define
the compilation target for both field and method type annotations, and CompiledMethod
will be updated in the same pass for consistency.
Module
Module.leading_comments: Vec<Comment> (currently the only comment store) is replaced
by leading comments on the first item in the module. If the module is empty, file-level
comments are preserved in a file_leading_comments: Vec<Comment> field on Module as
a narrow, explicitly documented exception to the core invariant. A synthetic EmptyFile
node was considered but rejected — adding a node purely to carry comments misrepresents
the program structure. The exception field is honest: an empty file is a real edge case,
not a missing node. The invariant is updated to read: "Every comment belongs to exactly
one AST node, except file-level comments in an empty module which are held in
Module.file_leading_comments."
Parser Changes
The parser attaches comments during construction using the existing trivia infrastructure:
fn collect_comment_attachment(&mut self) -> CommentAttachment {
let mut leading = Vec::new();
for trivia in self.current_token().leading_trivia() {
match trivia {
Trivia::LineComment(text) | Trivia::BlockComment(text) => {
leading.push(Comment { content: text.into(), kind: ..., span: ... });
}
Trivia::DocComment(_) => { /* handled separately by collect_doc_comment() */ }
Trivia::Whitespace(_) => {}
}
}
let trailing = self.collect_trailing_comment();
CommentAttachment { leading, trailing }
}
Note on trailing comments: Leading comments are found in the current token's leading trivia — comments that precede the token the parser is about to consume. Trailing comments (end-of-line after a node) require a different mechanism: after constructing a node, the parser checks the next token's leading trivia for a comment on the same line as the node's last token. This is a post-parse attachment step, not a look-ahead into the current token.
Note on collect_doc_comment interaction: The existing collect_doc_comment() uses
_ => lines.clear() — any non-doc trivia (including // comments) resets the doc
comment accumulator. This means a // comment interleaved before /// lines is correctly
excluded from the doc comment. The collect_comment_attachment() function processes the
same trivia but skips DocComment entries, so each comment type has exactly one owner.
The ordering dependency is: collect_doc_comment() runs first (for ///), then
collect_comment_attachment() runs on the same trivia (for // and /* */).
Unparser / Formatter
The unparser emits comments at their attached positions:
fn unparse_expression_statement(stmt: &ExpressionStatement) -> Document {
let leading = stmt.comments.leading.iter()
.map(|c| docvec![c.text(), Document::Newline])
.collect();
let trailing = stmt.comments.trailing.as_ref()
.map(|c| docvec![Document::Text(" "), c.text()])
.unwrap_or(Document::Nil);
docvec![leading, unparse_expression(&stmt.expression), trailing]
}
Programmatic Construction (Live Tools)
A class built in a live tool with no source origin attaches comments directly:
let method = MethodDefinition {
comments: CommentAttachment {
leading: vec![Comment::line("Calculate compound interest")],
trailing: None,
},
doc_comment: Some("P * (1 + r/n)^(nt)".into()),
// ...
};
No source positions needed. The comment is data, not recovered from text.
Prior Art
| Language | Approach | Key Lesson |
|---|---|---|
| Newspeak | Comments are first-class metadata on AST nodes, accessible via mirrors at runtime. Tagged metadata uses (*:tag: ... *) syntax. | Direct ancestor. Validates Option A. Comments as data, not trivia, is the right model for a reflective language. |
| Pharo | RBComment objects attached to enclosing sequence nodes by source interval. Not full AST nodes but associated data. | Comments should be associated data, not recovered from text gaps. The interval-based approach is fragile under AST manipulation. |
| Go (gofmt) | Dual storage: flat Comments []*CommentGroup on File AND Doc/Comment fields on individual nodes. | The Go team calls free-floating comments "the single biggest mistake" in the AST design. Dual storage causes constant synchronisation burden. Avoid. |
| Gleam | Separate Vec<Comment> sorted by position; consumed by position during formatter traversal. | Clean separation but makes programmatic AST construction with comments impossible. Known idempotency bugs. |
| Elixir | Code.string_to_quoted_with_comments/2 returns AST + separate comment list. Sourceror attaches comments to node metadata. | The separate-list approach is backward-compatible but fragile. Sourceror (node metadata) is closer to Option A and is the preferred third-party approach. |
| Rust (rustfmt) | Comments not in AST; recovered from source text gaps between spans (missed_spans.rs). | Acknowledged as rustfmt's primary source of bugs. Completely unsuitable for synthesized ASTs. Avoid. |
| Tree-sitter | Comments are full CST nodes (extras), siblings of code nodes. | Maximum fidelity. Every tree walk must handle comment siblings. Appropriate for a generic parser framework, overkill for a single language's compiler. |
User Impact
Newcomer (coming from Python/JS/Ruby): No visible change to the language syntax.
Comments work exactly as before. The benefit appears in tooling: beamtalk fmt preserves
their comments rather than silently deleting them. This builds trust that the tools are
safe to run.
Smalltalk developer: Consistent with Pharo and Newspeak models. Comments surviving refactoring and formatting is the expected behaviour in Smalltalk environments. The live-edit-then-persist model matches the Pharo workflow of editing methods in a browser and filing out to disk.
Erlang/BEAM developer: No impact on compiled output from this ADR — CommentAttachment
fields (// and /* */) are skipped by codegen entirely and never appear in Core Erlang
or BEAM bytecode. StateDeclaration.doc_comment and type annotations are also skipped
for now (deferred to the FieldDescriptor ADR).
The FieldDescriptor ADR will introduce compilation targets for field doc comments,
field type annotations, and method type annotations — compiled to message sends to the
class object during class definition, following the same pattern as CompiledMethod.doc
(ADR 0033). These are class-side operations: CompiledMethod instances live in the
class's method dictionary, and FieldDescriptor instances will be held by the class
object — neither are per-instance data. The compiled output is class-definition
behaviour in Core Erlang, not BEAM metadata chunks or instance initialisation.
Regular // and /* */ comments never reach the runtime under any scenario.
Tooling developer (LSP, formatter, refactoring tools): Major improvement. Every tool gets complete AST data in one parse. No secondary pass to recover comments. No separate token stream to maintain. Synthesising new AST nodes with comments is a simple field assignment.
Production operator: No impact. Comments do not affect runtime behaviour, hot code reloading, or OTP integration.
Steelman Analysis
Option B: Separate Comment Index (Gleam/Elixir style)
🧑💻 Newcomer: "This changes nothing about the AST I'm used to. I can ignore the comment list entirely if I'm writing a tool that doesn't care about comments."
🎩 Smalltalk purist: "Keeping comments out of the AST is cleaner — in Smalltalk, comments are documentation, not program structure. A separate list honours that."
⚙️ BEAM veteran: "The Gleam team chose this approach and it works for their formatter. If it's good enough for Gleam it's good enough for Beamtalk."
🎨 Language designer: "AST nodes stay simple. No wrapper types, no field additions to every variant. Easier to reason about the AST in isolation."
Tension: The steelman is strongest for static formatting of existing source files. It collapses entirely for synthesized ASTs (live tools) — there is no viable way to attach comments to nodes that have no source positions. This is the decisive failure.
Option C: Concrete Syntax Tree (Tree-sitter style)
🧑💻 Newcomer: "The tree is complete. I never have to worry about lost information."
⚙️ BEAM veteran: "Incremental parsing, precise source maps, IDE-quality tooling from day one."
🎨 Language designer: "Full fidelity is the only honest answer. Everything else is approximation."
Tension: Tree-sitter is a parser framework, not a language compiler. The overhead of handling comment siblings in every AST walk, and the rearchitecture required, is not justified today. The decision point to revisit Option C is when all three of these conditions hold:
- A tree-sitter grammar exists for editor integration (Neovim/Helix/Zed syntax highlighting). This is a separate grammar from the compiler parser — editor integration alone does not force Option C.
- The two-parser problem is causing real maintenance cost. Once a tree-sitter grammar exists, the compiler parser and the tree-sitter grammar will diverge under language changes. When patching that divergence is taking more effort than a one-time rearchitecture, the "tooling == compiler" principle (ADR 0024) makes the answer clear.
- Error recovery or incremental parse latency is a measurable user complaint. Tree-sitter's incremental reparsing is its key advantage. Beamtalk files are small by Smalltalk convention, so full-reparse latency is unlikely to matter soon. The trigger is profiling evidence or LSP user complaints about broken diagnostics on malformed input.
Option A is deliberately forward-compatible with this migration. CommentAttachment is
the shape a CST lowering pass would produce anyway. Moving to Option C replaces where
CommentAttachment is populated (parser → CST lowering pass), not how AST consumers
use it. All Phase 3 and Phase 4 work survives the migration intact.
Option D: Preserve Token Stream (TypeScript/Roslyn style)
🧑💻 Newcomer: "Zero changes to the AST. I can learn one thing at a time."
⚙️ BEAM veteran: "This is the industry standard. TypeScript, Roslyn, and rust-analyzer all do it this way. Proven at massive scale."
🎨 Language designer: "The token stream is already produced by the lexer. Keeping it is natural — we're just not throwing it away. The AST stays clean, the formatter gets what it needs, and nobody else pays a tax."
🏭 Operator: "Zero runtime impact — the token stream is only retained for tooling paths, not compilation."
Tension: This is the strongest alternative for pure formatting. But it fails on two
fronts, not one. First, synthesized ASTs have no token stream — a class built in a live
tool cannot emit comments because there are no tokens to walk. Second, and more
fundamentally, token preservation is a formatting concept only: it says nothing about
where type annotations and doc comments live on FieldDescriptor and CompiledMethod
at runtime. Those runtime objects need their data stored as fields — which is exactly
what Option A provides. A system that uses Option D for formatting and Option A for
runtime reflection ends up with two comment/annotation storage models anyway, which is
the split we are trying to avoid.
Tension Points
Option B is faster to implement and has no AST churn. Option D (token preservation) is
the industry standard for formatters. Option A is the only approach that handles both
synthesized ASTs from live tools and runtime metadata storage on FieldDescriptor
and CompiledMethod.
Live tool persistence and FieldDescriptor are not hypothetical future concerns —
they are confirmed requirements. FieldDescriptor will store .doc, .typeAnnotation,
and .defaultValue as runtime fields; CompiledMethod will gain type annotation
storage in the same pass. Option B and D have no answer for synthesized objects with no
source file origin. Building the formatter on Option B or D and then rebuilding on
Option A when FieldDescriptor lands means paying the AST churn twice. The decision
is to pay it once, now.
Alternatives Considered
Alternative A: Span-Gap Recovery (rustfmt style)
Recover comments from source text by examining the gap between adjacent node spans. Requires no AST changes.
Rejected: Rustfmt maintainers describe this as the primary source of formatter bugs. Completely impossible for synthesized AST nodes (no source text to examine). Violates the "tooling == compiler" principle by requiring the formatter to re-examine source text rather than using the AST.
Alternative B: Separate Comment Index (Gleam/Elixir style)
Store a Vec<Comment> sorted by byte position alongside the Module. During formatting,
consume comments positionally.
Rejected: Cannot attach comments to synthesized AST nodes created by live tools
(no source positions). The Go team, who chose this approach (their flat Comments list),
call it "the single biggest mistake" in their AST design. Known idempotency bugs in
Gleam's formatter stem from this approach.
Alternative C: Concrete Syntax Tree
Represent every token including comments as a tree node.
Rejected: Major rearchitecture of the parser and all downstream consumers. Every tree walk must handle comment siblings. Appropriate if Beamtalk adopts tree-sitter; premature without it.
Alternative D: Preserve Token Stream Alongside AST (TypeScript/Roslyn style)
Keep the AST unchanged. Preserve the token stream from the lexer alongside the AST. The formatter reconstructs comments by walking tokens using their positions relative to AST node spans.
This is the approach used by TypeScript, Roslyn (C#), and rust-analyzer. It has zero AST churn and zero impact on downstream consumers (codegen, lint, LSP).
Rejected: Fails the synthesized-AST use case. A class created in a live tool has
no token stream — the formatter cannot emit comments because there are no tokens to
walk. For formatting existing source files this approach works well, but Beamtalk's
persistence model requires unparsing ASTs that may never have been parsed from source.
Additionally, the token stream duplicates span information already in the AST, creating
a synchronisation surface (not dual storage of comments, but dual representation of
source structure). The ast_walker and lint passes would still need to be
comment-unaware, meaning comment-dependent tooling (the formatter) uses a fundamentally
different data path than comment-independent tooling (the linter) — a split that
undermines "tooling == compiler."
Alternative E: Do Nothing / Keep Comments in Trivia Only
Keep the current behaviour. Tools that need comments re-lex the source.
Rejected: Directly violates "tooling == compiler". Means the formatter would need a separate parse, two data structures for the same file, and complex synchronisation. The exact failure mode that Go and Elixir are trying to escape from.
Consequences
Positive
- The formatter / unparser can round-trip
.btfiles losslessly without a secondary parse - Live tools can synthesise new AST nodes with comments attached as plain data
- One parse serves all consumers: compiler, LSP, formatter, linter, live tools
CommentAttachmentis a natural extension of the existingdoc_commentpattern (ADR 0008)- Codegen is unaffected — it skips
CommentAttachmentfields silently
Negative
- Significant AST churn. Method bodies and block bodies change from
Vec<Expression>toVec<ExpressionStatement>. Of the 14Vec<Expression>declarations inast.rs, 6 are statement-position fields that change (Module.expressions,ClassDefinition.expressions,MethodDefinition.body× 3 variants,Block.body). The remaining ~4 (MessageSend.arguments, list/arrayelements, etc.) are sub-expression positions and stay asVec<Expression>. Downstream, 35 files contain ~190.bodyreferences across codegen, lint, LSP, semantic analysis, and tests. All pattern matches and iteration at statement positions must unwrapExpressionStatementto reachExpression. Theast_walkermodule (BT-961) must be updated to traverseExpressionStatement. Test helpers that constructbody: vec![expr1, expr2]gain boilerplate wrapping. - Parser must attach comments during construction rather than discarding trivia — moderate complexity increase
CommentAttachmentadds fields toClassDefinitionandMethodDefinition- Memory overhead. Each
ExpressionStatementadds ~32 bytes (Vec<Comment>fat pointer +Option<Comment>) even when no comments are present. ConsiderOption<Box<CommentAttachment>>(8 bytes when None) if profiling shows this matters; the decision defers optimisation to implementation.
Neutral
Module.leading_commentsis replaced by leading comments on the first item (non-empty modules) orModule.file_leading_comments(empty modules) — equivalent semantics, different locationExpressionStatementis a new wrapper type; it does not affect language semantics- Comments remain invisible to the codegen pipeline
- Error recovery: Near
Expression::Errornodes (from parse errors), comment attachment proceeds normally — comments attach to the nearest valid node. If a parse error prevents node construction, orphaned comments attach to the error recovery node. This is no worse than the current behaviour (comments are discarded entirely). - REPL scope boundary: REPL input is parsed as a standalone expression sequence.
Comments entered in the REPL attach to
ExpressionStatementnodes within that sequence. They do not persist across REPL evaluations (the REPL evaluates and discards the AST). This is correct — REPL comments are transient.
Implementation
Phase 1: AST Types (S)
- Add
CommentAttachmentstruct toast.rs - Add
ExpressionStatementwrapper with convenience constructor (ExpressionStatement::bare(expr)) - Update
ClassDefinitionandMethodDefinitionwithcomments: CommentAttachment - Add
comments: CommentAttachmentanddoc_comment: Option<String>toStateDeclaration(state fields need both//and///captured at parse time) - Change
Vec<Expression>toVec<ExpressionStatement>in statement-position fields (~6 fields inast.rs:Module.expressions,ClassDefinition.expressions,MethodDefinition.body× 3 variants,Block.body) - Update
Module: replaceleading_commentswithfile_leading_comments: Vec<Comment>(used only for empty modules; non-empty modules carry comments on their first item)
Affected: crates/beamtalk-core/src/ast.rs
Phase 2: Parser (M)
- Extend
collect_comment_attachment()to handleLineCommentandBlockCommenttrivia (extending the existingcollect_doc_comment()pattern) - Attach
CommentAttachmentwhen constructingExpressionStatementnodes - Attach
CommentAttachmentand collectdoc_commentwhen constructingClassDefinition,MethodDefinition, andStateDeclaration - Codegen skips
StateDeclaration.doc_comment(no runtime target yet — deferred toFieldDescriptorADR) - Update
parse_module()to removeleading_commentscollection - Ensure
///doc comments are not duplicated intoCommentAttachment(dedup rule)
Affected: crates/beamtalk-core/src/source_analysis/parser/mod.rs
Phase 3: Downstream Updates (L)
This is the largest phase by volume. ~35 files reference .body on methods/blocks and
must unwrap ExpressionStatement. Key subsystems:
- Update
ast_walkermodule (BT-961) to traverseExpressionStatement - Update codegen to traverse
Vec<ExpressionStatement>(skipcommentsfield) - Update all 6 lint passes to traverse
ExpressionStatement - Update semantic analysis validators
- Update LSP providers
- Update all AST construction in tests (use
ExpressionStatement::bare()helper)
Affected: crates/beamtalk-core/src/ast_walker.rs,
crates/beamtalk-core/src/codegen/, crates/beamtalk-core/src/lint/,
crates/beamtalk-core/src/source_analysis/, crates/beamtalk-lsp/src/
Phase 4: Unparser / Formatter (L)
- Implement
unparse_expression_statement()emitting leading and trailing comments - Implement full AST → source unparser for all node types
- Implement
beamtalk fmtCLI command using the unparser - Add
fmt-beamtalktarget toJustfile - Round-trip validation: parse → unparse → parse must produce equivalent ASTs
- Update
extract_method_sourceto use the unparser instead of raw source slicing. The current implementation (source[span.start..span.end]) misses leading comments (they are beforemethod.span.start()) and falls back to the selector name for synthesized methods. Once the unparser exists,CompiledMethod.sourceshould be generated by unparsing theMethodDefinitionnode — giving complete, comment-inclusive, formatter-normalised source for all methods whether parsed from a.btfile or constructed programmatically by a live tool.
Affected: crates/beamtalk-core/src/unparse/ (new module),
crates/beamtalk-cli/src/commands/fmt.rs (new command),
crates/beamtalk-core/src/codegen/core_erlang/gen_server/methods.rs (extract_method_source)
Migration Path
This is an internal compiler change. There is no change to Beamtalk syntax or
semantics. Existing .bt files are unaffected. The migration is entirely within the
Rust codebase:
- All
Vec<Expression>in statement positions becomeVec<ExpressionStatement> - Pattern matches on method/class nodes gain a
commentsfield (ignored with..in existing code until consumers are updated) - The compiler, linter, and LSP must update traversal code but produce identical output
A linked ADR will cover dirty/clean class tracking in the workspace — knowing which in-memory class representations have been modified since last written to disk, so persistence can be selective.
A second linked ADR will cover FieldDescriptor objects — first-class runtime
objects (consistent with Class and CompiledMethod) that expose .name, .doc,
.typeAnnotation, and .defaultValue via message sends. That ADR will define:
- The compilation target for
StateDeclaration.doc_commentandStateDeclaration.type_annotation - Runtime storage of type annotations on
CompiledMethod(parameter types, return type) for consistency - The
MyClass fieldsaccessor API returning a collection ofFieldDescriptorinstances
Until that ADR lands, field doc comments and type annotations are collected at parse time and available throughout the AST pipeline (formatter, LSP, type checker) but not compiled to runtime objects.
Implementation Tracking
Epic: BT-972 Issues: BT-973, BT-974, BT-975, BT-976, BT-977, BT-978 Status: Planned
| Issue | Title | Phase |
|---|---|---|
| BT-973 | Add CommentAttachment/ExpressionStatement types and comment fields to AST | 1 |
| BT-974 | Migrate statement-position Vec | 1 |
| BT-975 | Parser: attach comments to ClassDefinition, MethodDefinition, StateDeclaration | 2 |
| BT-976 | Parser: attach comments to ExpressionStatement; handle file_leading_comments | 2 |
| BT-977 | Implement AST unparser and update extract_method_source | 3 |
| BT-978 | Implement beamtalk fmt CLI command | 3 |
References
- Related issues: BT-962 (lint cleanup epic), BT-963–966 (lint violation fixes)
- Related ADRs:
- ADR 0008 — Doc Comments and API Documentation (established
doc_comment: Option<String>on AST nodes) - ADR 0033 — Runtime-Embedded Documentation (superseded ADR 0008's EEP-48 chunk generation;
///now compiles todoc:message sends populatingCompiledMethod.docat load time) - ADR 0018 — Document Tree Code Generation (Wadler-Lindig; unparser will use the same
DocumentAPI) - ADR 0024 — Static-First, Live-Augmented IDE Tooling ("tooling == compiler" principle)
- ADR 0035 — Field-Based Reflection API (renamed instVar→field;
FieldDescriptoris the natural next step) - Future ADR:
FieldDescriptorobjects — runtime compilation target forStateDeclaration.doc_comment
- ADR 0008 — Doc Comments and API Documentation (established
- Prior art: