ADR 0041: Universal State-Threading Block Protocol
Status
Accepted (2026-02-24)
Context
The Composability Problem
Beamtalk compiles to Core Erlang, which runs on the BEAM — a platform with no mutable variables. When Beamtalk code mutates a variable inside a block (closure), the compiler must thread state through the generated code: packing mutated variables into a StateAcc map, passing it through recursive calls, and unpacking afterward.
Today, this state threading is gated by a hardcoded whitelist of ~20 control-flow selectors in is_control_flow_selector():
whileTrue: whileFalse: timesRepeat: do: collect:
select: reject: ensure: to:do: to:by:do:
inject:into: on:do: ifTrue: ifFalse: ifNil: ifNotNil:
ifTrue:ifFalse: ifNil:ifNotNil:
Only literal blocks passed to these selectors receive state threading. User-defined higher-order methods (HOMs — methods that accept block arguments, such as custom iterators or visitors) do not participate — mutations in blocks passed to custom methods silently fail to propagate.
Example of the Failure
"Works — whileTrue: is whitelisted"
count := 0.
[count < 10] whileTrue: [count := count + 1].
count "=> 10"
"Fails — myCustomLoop: is NOT whitelisted"
MyCollection >> myCustomLoop: aBlock
self do: [:each | aBlock value: each].
count := 0.
items myCustomLoop: [:x | count := count + x].
count "=> 0 — mutations lost!"
This breaks Smalltalk's fundamental assumption that blocks are composable. A library author cannot write custom iteration methods that work with mutating blocks, even though equivalent stdlib methods work fine.
Current Architecture
The state-threading pipeline has four stages:
-
Semantic analysis (
block_context.rs):is_control_flow_selector()classifies blocks intoControlFlow,Stored,Passed,Other, orUnknowncontexts based on a hardcoded whitelist. -
Block mutation analysis (
block_analysis.rs):BlockMutationAnalysistrackslocal_reads,local_writes,captured_reads,field_reads,field_writes, andhas_self_sendsper block.needs_mutation_threading()decides whether to use stateful codegen. -
Control-flow codegen (
while_loops.rs,counted_loops.rs,list_ops.rs): Each whitelisted selector has a dedicated codegen path that generates tail-recursive loops with pack/unpack/restore usingStateAccmaps. The loop signature isfun(StateAcc) -> {Result, StateAcc}. -
Non-local returns (
value_type_codegen.rs,mod.rs):^inside blocks throws{'$bt_nlr', Token, Value}(value types) or{'$bt_nlr', Token, Value, State}(actors). Methods with blocks containing^are wrapped in try/catch.
The Two-Convention Problem
The current system uses two different calling conventions for blocks:
- Whitelisted control flow:
fun(StateAcc) -> {Result, NewStateAcc}— state is threaded - Everything else:
fun(Args) -> Result— plain closure, no state
This duality means:
- Every new collection method must be added to the whitelist
- User-defined HOMs can never participate in state threading
- The
BlockContextenum (ControlFlow/Stored/Passed) exists solely to route between these two paths - Two parallel codegen paths (pure and stateful) must be maintained for every control-flow construct
Decision
Replace the hardcoded control-flow whitelist with a universal block protocol where all blocks that capture mutable state use fun(Args..., StateAcc) -> {Result, NewStateAcc}. This makes state threading a first-class part of the block calling convention rather than a special case for known selectors.
The Universal Block Calling Convention
All blocks use the stateful signature when passed to user-defined HOMs or stored in variables:
fun(Arg1, ..., ArgN, StateAcc) -> {Result, NewStateAcc}
StateAccis an Erlang map containing the current values of all captured mutable variables (local variables and fields)- The block returns a 2-tuple:
{Result, NewStateAcc}whereNewStateAccreflects any mutations - State keys use prefixed names:
'__local__varName'for locals,'field__fieldName'for fields - For pure blocks (no mutations),
StateAccpasses through unchanged:{Result, StateAcc}
This uniform convention means callers never need to know whether a block mutates. Every block has the same calling interface. A HOM like eachPair: always passes StateAcc as the final argument and always destructures {Result, NewState} from the return — regardless of what the block does internally.
Two-Tier Optimization
The uniform convention applies to all blocks at unknown call sites (HOMs, stored blocks, cross-module calls). At known inline call sites — where the compiler controls both the block and the caller — the compiler may optimize away the protocol:
- Known inline sites (current whitelisted control flow:
do:,collect:,whileTrue:, etc.): The compiler generates pack/unpack scaffolding directly. Pure blocks at these sites can skip the state parameter entirely, since the compiler generates both the block and the loop. - Unknown call sites (user-defined HOMs, stored blocks, cross-module calls): All blocks use the
fun(Args..., StateAcc) -> {Result, NewStateAcc}convention uniformly.
This two-tier approach preserves zero-overhead codegen for the hot path (stdlib iteration) while enabling universal composability for user-defined HOMs. The key distinction: the whitelist no longer gates whether state threading happens — it only serves as an optimization hint for inlineable call sites.
Caller-Side Protocol
When a method receives a block argument (a HOM), the compiler generates code that:
- Passes
StateAccas the final argument to every block invocation - Destructures
{Result, NewState}from the return - Threads
NewStateto the next invocation (in loops) or returns it (for single calls)
%% Compiled body of: eachPair: aBlock
%% The compiler always generates the stateful calling convention
'eachPair:'/3 = fun (Self, ABlock, State) ->
letrec 'loop'/2 = fun (I, StateAcc) ->
case call 'erlang':'<'(I, call 'beamtalk_dispatch':'dispatch'(Self, 'size', [])) of
<'true'> when 'true' ->
let A = call 'beamtalk_dispatch':'dispatch'(Self, 'at:', [I]) in
let B = call 'beamtalk_dispatch':'dispatch'(Self, 'at:', [call 'erlang':'+'(I, 1)]) in
let {_Result, StateAcc1} = apply ABlock (A, B, StateAcc) in
apply 'loop'/2 (call 'erlang':'+'(I, 2), StateAcc1)
<'false'> when 'true' -> {'nil', StateAcc}
end
in apply 'loop'/2 (0, State)
The block argument ABlock is always called with StateAcc and always returns {Result, NewState}. This is true whether ABlock mutates state or not — a pure block simply returns {Result, StateAcc} unchanged.
Erlang Interop Boundary
When a Beamtalk block is passed to Erlang/Elixir code (which doesn't know about the state protocol), the compiler generates a wrapper fun that strips the protocol:
%% Beamtalk block: fun(X, StateAcc) -> {X + 1, StateAcc}
%% Wrapped for Erlang: fun(X) -> X + 1
%% The wrapper captures CurrentStateAcc at creation time so reads succeed,
%% but mutations are dropped (Erlang cannot propagate the updated state).
let ErlangFun = fun (X) ->
let {Result, _State} = apply BeamtalkBlock (X, CurrentStateAcc) in
Result
end
The wrapper captures the enclosing CurrentStateAcc so that stateful blocks can read their captured variables. However, mutations made inside the block are discarded — the updated _State is not propagated back to the Beamtalk caller. The compiler should emit a warning when a stateful block is passed to an Erlang call site, since mutations will be silently dropped.
Conversely, when Erlang funs are received by Beamtalk HOMs, they are wrapped to conform to the protocol. This marshalling happens at the language boundary, preserving ADR 0028's interop contract.
Interaction with Gradual Typing (ADR 0025)
The StateAcc parameter is a compiler implementation detail hidden from the type system. Block types reflect user-visible arity only:
[:x | x + 1]has typeBlock(Integer -> Integer)— the hidden StateAcc parameter is not part of the type signature[:x | count := count + x]also has typeBlock(Integer -> Integer)— same user-visible signature, even though the compiled fun has an extra parameter
This means the type checker validates the user's mental model (declared parameters and return type), and the compiler independently handles state threading. A HOM typed as eachPair: (Block(A, A -> A)) accepts both pure and mutating blocks — the calling convention is uniform at the machine level.
This decision avoids leaking compiler internals into the type language and ensures that adding or removing a mutation inside a block never changes its type signature.
Interaction with Session/Workspace State (ADR 0040)
There are two distinct "state" concepts in play:
-
beamtalk_repl_state(Erlang record) — the REPL session infrastructure living in thebeamtalk_repl_shellgen_server. Contains bindings (session locals likex = 42),eval_counter,module_tracker,actor_registry,class_sources. This is what ADR 0040 (BT-841) restructures. -
StateAcc(Core Erlang map) — the compiler's state-threading map for mutation inside blocks. Generated by codegen, passed asfun(Args..., StateAcc) -> {Result, NewStateAcc}. (In REPL eval,Args...is empty — the outermost eval fun isfun(Bindings) -> {Result, UpdatedBindings}.) This is what BT-842 universalizes.
The overlap is at the eval boundary (beamtalk_repl_shell.erl:155):
{RawResult, UpdatedBindings} = apply(ModuleName, eval, [BindingsWithRegistry])
The REPL eval module receives Bindings as its argument, aliases it to State inside the generated code (repl_codegen.rs:30), and returns {Result, UpdatedBindings}. This is already the same {Result, State} tuple shape that BT-842 universalizes.
Today they are one map, and they must stay one map. The REPL codegen does let State = Bindings, then control-flow blocks thread that same State. Mutations to session locals inside whileTrue: blocks work because they share the map. The universal protocol changes nothing here — it just means every block threads this map, not just whitelisted ones.
The REPL eval function is just the outermost "block" that receives the session's StateAcc. If a future Session object (as ADR 0040 left open) owns bindings as a first-class object, its bindings map must be the same map that blocks thread. Two maps (session bindings + block locals) would require merge/split at every block boundary — exactly the complexity this ADR eliminates.
Generated Core Erlang Examples
Simple block (no mutations — unchanged)
items collect: [:x | x + 1]
%% Pure block — no StateAcc parameter, no tuple return
call 'lists':'map'(fun (Elem) -> call 'erlang':'+'(Elem, 1) end, Items)
whileTrue: with mutation (current pattern, now universal)
count := 0.
[count < 10] whileTrue: [count := count + 1].
count
%% Pack: store 'count' into StateAcc before loop
let Packed0 = call 'maps':'put'('__local__count', Count, State) in
letrec 'while'/1 = fun (StateAcc) ->
%% Unpack: read 'count' from StateAcc at each iteration
let Count = call 'maps':'get'('__local__count', StateAcc) in
%% Condition (inline — known control-flow site, no fun wrapper)
let CondResult = call 'erlang':'<'(Count, 10) in
case CondResult of
<'true'> when 'true' ->
%% Body: mutate count, update StateAcc
let NewCount = call 'erlang':'+'(Count, 1) in
let StateAcc1 = call 'maps':'put'('__local__count', NewCount, StateAcc) in
apply 'while'/1 (StateAcc1)
<'false'> when 'true' -> {'nil', StateAcc}
end
in apply 'while'/1 (Packed0)
User-defined HOM with mutation (NEW — previously impossible)
MyIterator >> eachPair: aBlock
| i |
i := 0.
[i < self size] whileTrue: [
aBlock value: (self at: i) value: (self at: i + 1).
i := i + 2
].
"Usage — count mutations now propagate through eachPair:"
count := 0.
pairs eachPair: [:a :b | count := count + a + b].
count "=> correct sum"
%% The block passed to eachPair: — compiled with StateAcc parameter
fun (A, B, StateAcc) ->
let Count = call 'maps':'get'('__local__count', StateAcc) in
let Sum = call 'erlang':'+'(call 'erlang':'+'(Count, A), B) in
let StateAcc1 = call 'maps':'put'('__local__count', Sum, StateAcc) in
{Sum, StateAcc1}
end
%% eachPair: — compiled to always use the stateful protocol
%% It passes StateAcc to the block and destructures {Result, NewState}
'eachPair:'/3 = fun (Self, ABlock, State) ->
letrec 'loop'/2 = fun (I, StateAcc) ->
case call 'erlang':'<'(I, call 'beamtalk_dispatch':'dispatch'(Self, 'size', [])) of
<'true'> when 'true' ->
let A = call 'beamtalk_dispatch':'dispatch'(Self, 'at:', [I]) in
let B = call 'beamtalk_dispatch':'dispatch'(Self, 'at:', [call 'erlang':'+'(I, 1)]) in
let {_Result, StateAcc1} = apply ABlock (A, B, StateAcc) in
apply 'loop'/2 (call 'erlang':'+'(I, 2), StateAcc1)
<'false'> when 'true' -> {'nil', StateAcc}
end
in apply 'loop'/2 (0, State)
The key insight: eachPair: always uses the stateful protocol. It doesn't need to know whether the block mutates — it passes StateAcc and destructures the return uniformly. A pure block like [:a :b | a + b] would compile to fun(A, B, StateAcc) -> {call 'erlang':'+'(A, B), StateAcc} — state passes through unchanged.
Stored closures that mutate
count := 0.
myBlock := [:x | count := count + x].
10 timesRepeat: [myBlock value: 1].
count "=> 10"
%% Stored block: compiled with StateAcc parameter
%% At creation, captures nothing (state comes in at call time)
let MyBlock = fun (X, StateAcc) ->
let Count = call 'maps':'get'('__local__count', StateAcc) in
let NewCount = call 'erlang':'+'(Count, X) in
let StateAcc1 = call 'maps':'put'('__local__count', NewCount, StateAcc) in
{NewCount, StateAcc1}
end in
%% timesRepeat: calls: apply MyBlock (1, StateAcc) and threads result state
let Packed0 = call 'maps':'put'('__local__count', Count, State) in
letrec 'loop'/2 = fun (I, StateAcc) ->
case call 'erlang':'>'(I, 0) of
<'true'> when 'true' ->
let {_Result, StateAcc1} = apply MyBlock (1, StateAcc) in
apply 'loop'/2 (call 'erlang':'-'(I, 1), StateAcc1)
<'false'> when 'true' -> {'nil', StateAcc}
end
in apply 'loop'/2 (10, Packed0)
State-Carrying Non-Local Returns
Currently, ^ inside blocks generates:
- Value types:
throw({'$bt_nlr', Token, Value})— 3-tuple, no state - Actors:
throw({'$bt_nlr', Token, Value, State})— 4-tuple, carries state
With universal state threading, all NLR throws must carry state, because field mutations accumulated before the ^ would otherwise be silently lost:
"detect: uses ^ inside do: — state must be preserved"
MyCollection >> detect: aBlock
self do: [:each |
(aBlock value: each) ifTrue: [^each]
].
nil
%% Universal protocol: ^ always carries state
%% Value type: now 4-tuple WITH state (was 3-tuple without)
call 'erlang':'throw'({'$bt_nlr', Token, Value, CurrentStateAcc})
%% Actor: unchanged (already carries state)
call 'erlang':'throw'({'$bt_nlr', Token, Value, CurrentStateAcc})
The try/catch at the method boundary extracts and applies the state:
%% Value type method with NLR
let NlrToken = call 'erlang':'make_ref'() in
try
{Result, FinalState} = <method body>
of {NlrResult, NlrState} -> {NlrResult, NlrState}
catch <NlrCls, NlrErr, NlrStk> ->
case {NlrCls, NlrErr} of
<{'throw', {'$bt_nlr', CatchTok, NlrVal, NlrState}}>
when call 'erlang':'=:='(CatchTok, NlrToken) ->
{NlrVal, NlrState} %% preserve mutations up to the ^
<OtherPair> when 'true' ->
primop 'raw_raise'(NlrCls, NlrErr, NlrStk)
end
This ensures that detect:, anySatisfy:, includes: (which use ^ inside do:) preserve all field mutations made before the early return.
Performance: Pure Block Optimization
The critical optimization: blocks with no captured mutations skip the protocol entirely.
The compiler already performs BlockMutationAnalysis on every block. When a block has no local_writes, no field_writes, and no has_self_sends, it compiles to a plain fun(Args) -> Result with zero overhead.
In practice, the majority of blocks are pure:
[:x | x + 1]— pure, no state parameter[:x | x > 0]— pure, no state parameter[:x | x printString]— pure (no captured mutation), no state parameter[:x | count := count + x]— mutatescount, gets state parameter
Estimated overhead per block invocation:
- Pure blocks at unknown call sites: ~20ns for
{Result, StateAcc}tuple allocation + unwrap. State passes through unchanged. - Stateful blocks: ~20ns tuple allocation + ~10ns per captured variable for
maps:get/2(unpack) + ~15ns per mutated variable formaps:put/3(repack). For a block capturing 2 variables and mutating 1: ~65ns total overhead. - Pure blocks at known inline sites (stdlib control flow): zero overhead — compiler generates direct code.
For context, a process dictionary get/put pair costs 100-500ns, and an ETS read costs ~1μs. The tuple+map approach is the cheapest mutable-state mechanism available on BEAM, and the two-tier optimization ensures stdlib hot paths pay nothing.
Reclassification of BlockContext and Whitelist
The is_control_flow_selector() whitelist is reclassified from a correctness gate to an optimization hint. It is not deleted — it continues to identify Tier 1 inline sites where the compiler can generate optimized pack/unpack codegen without the universal protocol overhead.
| Current | Universal Protocol |
|---|---|
BlockContext::ControlFlow — gates state threading | Optimization hint — triggers Tier 1 inline codegen |
BlockContext::Stored — compiled as plain fun | Tier 2 — stored blocks get fun(Args..., StateAcc) -> {Result, NewStateAcc} |
BlockContext::Passed — compiled as plain fun | Tier 2 — passed blocks get stateful convention |
BlockContext::Other — compiled as plain fun | Tier 2 — same as Stored/Passed; no special handling |
BlockContext::Unknown — compiled as plain fun | Tier 2 — unknown sites default to stateful convention |
is_control_flow_selector() — correctness gate | Optimization hint — identifies Tier 1 fast path |
classify_block() function | Retained — routes between Tier 1 and Tier 2 codegen |
The key change: blocks at Stored/Passed/Unknown sites now get state threading (previously they got nothing). The whitelist no longer prevents state threading — it identifies sites where the compiler can skip the universal protocol and use optimized inline codegen instead.
Impact on Existing Codegen Modules
| Module | Change |
|---|---|
block_context.rs | Reclassified. is_control_flow_selector() becomes optimization hint (Tier 1 routing), no longer a correctness gate. classify_block() retained to route between Tier 1 and Tier 2 codegen. |
block_analysis.rs | Unchanged. BlockMutationAnalysis remains the source of truth. |
while_loops.rs | Unchanged for Tier 1. The existing dual paths (_simple / _with_mutations) remain — they are the Tier 1 optimized codegen for known inline sites. No code deleted here. |
counted_loops.rs | Unchanged for Tier 1. Same as while_loops.rs — existing _simple / _with_mutations paths stay. |
list_ops.rs | Unchanged for Tier 1. Existing generate_list_do_with_mutations() / generate_list_inject_with_mutations() paths stay. |
value_type_codegen.rs | Modified. NLR throws gain state parameter (3-tuple → 4-tuple for value types). Try/catch extracts state from the throw. |
mod.rs (main codegen) | Modified. New Tier 2 codegen: block compilation at unknown call sites emits fun(Args..., StateAcc) -> {Result, NewStateAcc}. Caller-side protocol for HOMs. Expression::Return codegen always includes state when NLR token is active. |
Net effect: code increases. The Tier 1 inline codegen (~12 existing pure/stateful function pairs across while_loops.rs, counted_loops.rs, list_ops.rs) is retained unchanged. New Tier 2 universal protocol code is added. The win is correctness (HOMs work with mutating blocks), not code reduction.
Prior Art
Haskell State Monad — s -> (a, s)
The proposed protocol is identical in shape to Haskell's State monad: newtype State s a = State { runState :: s -> (a, s) }. GHC optimizes this aggressively via newtype erasure (zero-cost wrapper), unboxed tuples (no heap allocation for the pair), and the "state hack" (treating state-threaded lambdas as single-entry for inlining).
Adopted: The s -> (a, s) return convention. This is the standard functional approach to threading mutable state through pure computations.
Not applicable: GHC's newtype erasure and unboxed tuples have no BEAM equivalent. BEAM always allocates tuples, but small tuple allocation is highly optimized (~20ns).
Kotlin inline + Ref Boxing
Kotlin wraps captured mutable variables in IntRef/ObjectRef containers. The inline keyword copies lambda bodies at call sites, eliminating the Ref overhead for known call sites.
Rejected for Beamtalk: On BEAM, the equivalent of Ref boxing would require process dictionary or ETS, which are 5-25x more expensive than tuple returns. Kotlin's approach works because JVM field writes cost ~1ns; BEAM has no equivalent.
Insight adopted: Kotlin's split between inline (zero overhead, incomplete) and non-inline (Ref overhead, universal) mirrors Beamtalk's current whitelist vs. the proposed universal protocol. We choose universality over zero-overhead incompleteness.
Elixir/Erlang SSA and Accumulator Threading
Erlang's SSA compiler (beam_ssa) already optimizes accumulator-threaded loops with tail-call optimization and Y-register allocation. Enum.reduce/3 is the canonical Elixir pattern: Enum.reduce(list, acc, fn x, acc -> acc + x end).
Adopted: The universal protocol generates code that matches BEAM's native accumulator patterns, ensuring the SSA optimizer can work effectively. The generated lists:foldl-style loops are idiomatic BEAM code.
Clojerl (Clojure on BEAM)
Clojerl implements mutable atoms using dedicated Erlang processes (with a proposal to switch to ETS + compare-and-swap). Every swap!/reset! requires inter-process communication.
Cautionary tale: External mutable state on BEAM is expensive. Process-based atoms cost microseconds per operation vs. nanoseconds for tuple returns. This strongly validates the value-threading approach over Ref-boxing.
Gleam — gleam-eval Library
Gleam's community-developed gleam-eval library implements exactly the fn(state) -> #(state, result) pattern as a general-purpose state-threading mechanism. Combined with Gleam's use expressions, this provides ergonomic implicit state threading on BEAM.
Validation: The fact that this pattern emerged independently as a community library on BEAM confirms it is natural, practical, and performant for the target platform.
Pony — Reference Capabilities
Pony uses compile-time reference capabilities to prevent data races. Closures copy captured variables — mutating a captured variable inside a lambda does NOT affect the outer scope. Workarounds include array indirection or actor delegation.
Insight: Pony's approach of making closures NOT alias outer mutable variables is the opposite of what Beamtalk aims for. However, Pony's distinction between "values you can mutate" and "values you can share" inspired the pure/stateful block optimization: pure blocks (shareable, no state) vs. stateful blocks (mutable captures, need state threading).
User Impact
Newcomer (from Python/JS/Ruby)
Positive: Mutation in blocks "just works" everywhere — no mysterious failures when passing blocks to custom methods. The mental model is simple: "variables you assign to inside a block stay assigned."
Neutral: No new syntax to learn. The protocol is entirely compiler-internal.
Concern: When passing blocks to Erlang libraries, the wrapper marshalling adds a layer of indirection that may produce confusing error messages if something goes wrong.
Smalltalk Developer
Positive: Block composability is restored. Library authors can write custom iteration methods (myCustomLoop:, eachPair:, visitNodes:) and callers can pass mutating blocks, matching Smalltalk's fundamental promise.
Neutral: The distinction from Smalltalk images (where variables are truly mutable slots) is invisible — the compiled behavior matches expectations.
Concern: The two-tier optimization (inline for stdlib, universal for HOMs) is an implementation detail that occasionally leaks — performance characteristics differ between items do: [...] and items myDo: [...] even if they do the same thing.
Erlang/Elixir Developer
Positive: Generated code follows idiomatic BEAM patterns (accumulator threading via foldl-style loops). The {Result, State} convention is familiar from gen_server callbacks and Enum.reduce.
Neutral: The state map in crash dumps is readable and debuggable. Variable names are preserved as map keys ('__local__count').
Concern: Blocks passed across the Erlang boundary require wrapper funs (see Erlang Interop Boundary). Erlang code receiving a raw Beamtalk block would see an unexpected extra arity. The wrapper captures the enclosing StateAcc at creation time so reads succeed, but mutations made inside the block are silently dropped since Erlang cannot propagate the updated state. The compiler emits a warning when a stateful block is passed to an Erlang call site.
Production Operator
Positive: One calling convention is more predictable than two. Fewer special cases means fewer edge-case bugs in production.
Neutral: Performance overhead (tuple wrap/unwrap) is negligible for stateful blocks and zero for inline-optimized blocks.
Concern: Hot code reloading a class that changes a block's mutation status (e.g., adding a field write to a formerly pure block) is safe under the universal protocol (arity doesn't change), but the two-tier optimization for stdlib means inline-optimized blocks could theoretically change shape. The implementation must ensure that inline optimization decisions are stable across reloads.
Tooling Developer (LSP, IDE)
Positive: Eliminating the BlockContext classification from the codegen path simplifies the compilation pipeline. The LSP no longer needs to know which selectors are "special" for correctness.
Consideration: IDE tooltips could show whether a block is compiled as pure-passthrough or actively-stateful, aiding debugging. The BlockContext enum may be retained for this purpose even though it no longer gates codegen.
Steelman Analysis
Best Argument for Option B (Pragma Whitelist) — the strongest rejected alternative
| Cohort | Their strongest argument for keeping the whitelist |
|---|---|
| Newcomer | "Stdlib methods already work. I'd rather have zero overhead for do: and collect: than a universal protocol I'll never notice." |
| Smalltalk purist | "Pragmas are well-precedented in Pharo. <controlFlow: 0> is explicit and self-documenting. Magic conventions are worse than explicit annotations." |
| BEAM veteran | "Zero overhead for the common case. 99% of blocks are passed to stdlib methods. Why pay tuple overhead for all of them when the whitelist covers everything users actually use?" |
| Operator | "No new calling convention means no new failure modes. The whitelist has been stable and predictable. Adding a universal protocol is a risk." |
| Language designer | "Minimal change, maximum backwards compatibility. The whitelist works for stdlib; the pragma extends it to user code. Two mechanisms (whitelist + pragma) are simpler than rewriting all block codegen." |
Why Option D Wins Despite the Steelman
The BEAM veteran's argument is the strongest: zero overhead for the common case. However:
- The "99% stdlib" assumption is temporary. As the ecosystem grows, library authors will write custom HOMs. The whitelist doesn't scale.
- The pure block optimization eliminates the overhead argument. Blocks with no mutations compile to plain funs — same as today. Only blocks that actually mutate pay the tuple cost.
- Two calling conventions is permanent complexity. Every future codegen feature must handle both paths. One convention is fundamentally simpler to maintain, test, and debug.
- Pre-release is the right time. After v1.0, changing the block calling convention would be a breaking change. Now it's free.
Tension Points
- BEAM veterans and operators prefer Option B's conservatism — valid, but outweighed by long-term architectural simplicity
- Newcomers and Smalltalk purists strongly prefer Option D — blocks should "just work"
- Language designers are split: B is lower-risk, D is architecturally superior
- The deciding factor: one calling convention eliminates an entire category of compiler complexity (the whitelist,
BlockContextenum, dual codegen paths)
Alternatives Considered
Alternative 0: Status Quo (Do Nothing)
Keep the current hardcoded whitelist. Accept that user-defined HOMs don't support mutating blocks.
Rejected because:
- The composability failure is not theoretical — it will bite every library author who writes a custom iterator
- The whitelist requires manual maintenance for every new collection method
- Pre-release is the time to fix foundational issues; after v1.0 this becomes a permanent limitation
- However, it's worth noting: the whitelist covers all stdlib iteration patterns today, and the ecosystem is small. The pain is real but currently low-frequency.
Alternative A: Process Dictionary Boxing (Kotlin-style Ref)
Store mutated variables in the process dictionary. Blocks capture a key, reads/writes go through get/put.
%% count := 0. items do: [:x | count := count + x]
put('__local__count', 0),
lists:foreach(fun(X) -> put('__local__count', get('__local__count') + X) end, Items)
Rejected because:
- Process dict operations cost 100-500ns per read/write (vs. ~20ns for tuple return)
- Clojerl's experience confirms external mutable state on BEAM is expensive
- Process dict state is global within a process — no isolation between nested scopes
- Complicates actor state management (process dict vs. gen_server state)
Alternative B: User-Extensible Whitelist (Pragma)
Keep the current whitelist but let library authors add methods via pragma:
MyCollection >> customLoop: aBlock
<controlFlow: 0>
...
Rejected because:
- Leaky abstraction — users must know about the annotation to make their HOMs work
- Two calling conventions remain — all dual-path codegen complexity persists
- Doesn't solve stored blocks (
myBlock := [count := count + 1]) - Pragmas are viral — every HOM in the chain must be annotated
Alternative C: Inline Expansion
Inline block bodies at call sites, eliminating the closure boundary:
Rejected because:
- BEAM does not inline across module boundaries
- Same-module-only restriction defeats the purpose (library HOMs are cross-module)
- Code bloat from inlining large blocks
- Doesn't compose: inline expansion is all-or-nothing per call site
Alternative D-Hybrid: Universal for HOMs, Optimized for Stdlib
Keep optimized codegen for whitelisted stdlib selectors (current fast path), but compile all blocks passed to user-defined HOMs with the universal protocol.
This is a tempting middle ground that preserves zero-overhead stdlib iteration while enabling HOM composability. The current implementation already has dual codegen paths — this approach keeps them and removes only the whitelist gate.
Rejected in favor of the full universal protocol because:
- The whitelist is retained as an optimization hint, not eliminated — maintenance burden persists for new stdlib methods
- Two calling conventions still exist, so every future codegen feature (e.g., async blocks, pattern-matching blocks) must handle both paths
- The two-tier optimization in the chosen approach achieves the same performance benefit (inline stdlib, universal HOMs) without requiring the whitelist to be a correctness gate — it becomes a pure performance optimization that can be extended incrementally
Consequences
Positive
- Block composability restored. User-defined HOMs work with mutating blocks, matching Smalltalk's fundamental promise.
- Whitelist reclassified. The hardcoded selector list becomes an optimization hint, not a correctness gate. New collection methods get state threading automatically without whitelist changes.
BlockContextrepurposed. The classification is retained but now routes between Tier 1 (optimized inline) and Tier 2 (universal protocol) codegen, rather than gating correctness.- One calling convention at unknown sites. HOMs, stored blocks, and cross-module blocks all use the same protocol. Future codegen features for these sites only need one path.
- Stored blocks work.
myBlock := [count := count + 1]can be passed to any HOM and mutations propagate. - NLR state preservation. Non-local returns (
^) now carry accumulated state, preventing silent mutation loss indetect:,anySatisfy:, etc.
Negative
- Tuple overhead for stateful blocks. Each invocation of a stateful block allocates a 2-tuple and performs maps:get/put per captured variable. Full cost: ~65ns for a typical block with 2 captured variables and 1 mutation. Mitigated by two-tier optimization (zero overhead at known inline sites).
- Large implementation scope. Touches block compilation, all control-flow codegen, NLR infrastructure, and REPL state management. Estimated L-sized effort.
- Map overhead for state access.
maps:get/2andmaps:put/3per variable per iteration. For blocks with many captured variables, this grows linearly. However, most blocks capture 1-3 variables. - Debugging complexity. State is threaded implicitly — when debugging generated Core Erlang, developers must trace
StateAccflow through nested constructs. has_self_sendsover-approximation. Currently, any block containing a self-send (even read-only likeself size) triggers state threading. This is conservative but adds unnecessary StateAcc overhead for pure query blocks. Future optimization: distinguish read-only self-sends from mutating self-sends.- Erlang interop wrapper overhead and mutation loss. Blocks passed to Erlang/Elixir code require wrapper funs that strip the state protocol. The wrapper captures the enclosing
StateAccat creation time (so reads succeed), but mutations made inside the block are dropped since Erlang cannot propagate the updated state. The compiler warns when a stateful block crosses the boundary.
Neutral
- REPL codegen. The REPL threads workspace-global state through
StateAccmaps; compiled blocks thread block-local state through per-invocationStateAcc. These are different scopes but use the same mechanism. Theis_repl_modeconditional in current codegen will remain — REPL blocks thread workspace state, compiled blocks thread captured-variable state. The protocol shape is the same even though the state contents differ. In the REPL, the workspace StateAcc is the block's StateAcc (one map, not two), since workspace bindings are the captured variables. - Actor codegen. Actors already use 4-tuple NLR throws with state. The universal protocol extends this to value types, unifying the two paths.
- Test coverage. Existing control-flow tests remain valid. New tests needed for user-defined HOMs with mutation, stored blocks with mutation, and NLR with state in value types.
Implementation
Phase 0: Wire Check (proof of concept)
- Compile one block with the universal convention (
fun(X, StateAcc) -> {Result, StateAcc}) - Compile one HOM that calls it with the stateful protocol
- Verify round-trip: block mutation propagates through the HOM back to the caller
- Verify Erlang boundary wrapper: stateful block passed to
lists:mapvia wrapper fun - Goal: Prove the core calling convention works before rewriting all codegen
- Affected:
mod.rs(block compilation), manual test case
Phase 1: Block Compilation + Caller-Side Protocol
- Modify block codegen to generate
fun(Args..., StateAcc) -> {Result, NewStateAcc}for all blocks at unknown call sites (HOMs, stored blocks) - Pure blocks at unknown call sites get passthrough:
fun(Args..., StateAcc) -> {Result, StateAcc} - Pure blocks at known inline call sites (stdlib control flow) remain optimized:
fun(Args...) -> Result - Generate caller-side protocol for methods receiving block arguments: always pass
StateAcc, always destructure{Result, NewState} - Generate Erlang boundary wrappers for blocks crossing language boundaries
- Affected:
mod.rs(block compilation + method codegen),block_analysis.rs(already correct)
Phase 2: Control-Flow Codegen Migration
- Update
while_loops.rs,counted_loops.rs,list_ops.rsto use the universal protocol for stateful blocks - Retain optimized inline codegen for known control-flow selectors (these remain the fast path)
- Unify stateful list operations around
lists:foldlwith state-aware lambdas - Affected:
control_flow/directory
Phase 3: NLR State Threading
- Update value type NLR throws to include
StateAcc(3-tuple → 4-tuple) - Update try/catch wrappers to extract state from NLR throws
- Unify actor and value type NLR paths
- Affected:
value_type_codegen.rs,mod.rs(Expression::Return)
Phase 4: Whitelist Reclassification
- Remove
is_control_flow_selector()as a correctness gate — state threading no longer depends on it - Retain the whitelist as a performance optimization hint for inline codegen
- Repurpose
BlockContextfor IDE/LSP use and inline optimization decisions - Affected:
block_context.rs,semantic_analysis/mod.rs
Phase 5: Testing and Validation
- Add tests for user-defined HOMs with mutation (the core scenario)
- Add tests for stored blocks with mutation passed to HOMs
- Add tests for NLR with state preservation (
detect:,anySatisfy:) - Add tests for Erlang interop boundary (blocks passed to Erlang funs)
- Add tests for dynamic block assignment (pure/stateful blocks in same variable)
- Verify all existing stdlib tests pass unchanged
- Performance benchmarks: inline-optimized blocks (zero overhead), universal blocks (tuple+map overhead)
Migration Path
This is a compiler-internal change with no syntax changes. Existing Beamtalk code continues to work unchanged. The only observable difference is that mutations in blocks passed to user-defined HOMs now propagate correctly — previously silent failures become correct behavior.
No user migration is required.
Implementation Tracking
Epic: BT-850 — Epic: Universal State-Threading Block Protocol (ADR 0041) Issues:
- BT-851 — PoC: Tier 2 block round-trip (Phase 0, S)
- BT-852 — Emit stateful fun signature for Tier 2 blocks (Phase 1, M)
- BT-853 — Caller-side protocol for HOM call sites (Phase 1, M)
- BT-854 — NLR state threading: 3-tuple → 4-tuple (Phase 2, M)
- BT-855 — Erlang interop wrapper funs (Phase 2, M)
- BT-856 — Reclassify whitelist as optimization hint (Phase 3, S)
- BT-857 — Tests: HOM composability, NLR, interop (Phase 4, M)
Status: Planned
References
- Related issues: BT-842
- Blocks: BT-305 (Syntax Pragmatism vs Smalltalk) — BT-842 must merge before BT-305 can proceed
- Related ADRs: ADR 0005 (BEAM Object Model), ADR 0018 (Document Tree Codegen), ADR 0025 (Gradual Typing — block arity implications), ADR 0028 (BEAM Interop — boundary wrapper design), ADR 0040 (Workspace-Native REPL Commands — session state interaction)
- Current implementation:
block_context.rs(whitelist),block_analysis.rs(mutation analysis),control_flow/(state threading codegen) - Appel, "Compiling with Continuations" (1992) — closure conversion and mutable variable boxing
- Kennedy, "Compiling with Continuations, Continued" (ICFP 2007) — CPS vs direct-style compilation
- Haskell State monad:
s -> (a, s)— same shape as the universal protocol - Gleam
gleam-evallibrary — community validation of this pattern on BEAM - Clojerl atom implementation — cautionary tale for external mutable state on BEAM