ADR 0021: Stream — Lazy Pipeline for Value-Side Data

Status

Implemented (2026-02-15), Revised (2026-03-05)

Context

Problem

Beamtalk has basic file I/O (File readAll:, File writeAll:contents:) and eager collection iteration (do:, collect:, select:), but no lazy interface for sequential data. Every data source — files, collections, generators — needs its own iteration pattern today.

Without a Stream abstraction, Beamtalk users cannot:

  1. Read files line-by-line (only File readAll: which slurps entire content)
  2. Process large data lazily (everything is eager — full result materialized)
  3. Compose data pipelines across different value-side sources
  4. Write code that works with any caller-owned sequential data source (file, collection, generator)

Scope Limitation — Value-Side Only

Revised 2026-03-05: The original ADR framed Stream as a "universal data interface" covering files, collections, network sockets, OS processes, and generators. Experience with ADR 0043 (sync-by-default actor messaging) and ADR 0051 (subprocess execution) has shown that Streams are fundamentally limited to value-side use cases — where the caller owns the data source and evaluates the stream in its own process.

Streams cannot cross process boundaries because:

  1. Port/file handles are process-local — the process that opens a port or file handle is the only one that can read from it. A Stream's generator closure captures a handle, so the Stream must be consumed by the same process that created it.
  2. ADR 0043 (sync-by-default). is gen_server:call. An actor method must return a complete value, not a lazy generator that depends on actor-internal resources. Returning a port-backed Stream from an actor is semantically broken — the generator would run in the caller's process but the port lives in the actor's process.
  3. Proven by ADR 0051 — the Subprocess actor cannot return a Stream of stdout lines. Instead it uses readLine (a sync gen_server:call that returns the next buffered line). This is the correct pattern for cross-process sequential data on BEAM.

What this means in practice:

ContextStreams work?Pattern instead
File I/O (caller-owned handle)YesFile lines: "data.csv"
Collection transformsYes#(1, 2, 3) stream select: [...]
Pure generatorsYesStream from: 1 by: [:n | n * 2]
Actor → caller data flow (via message-send generator)Yesagent lines — Stream generator calls readLine via gen_server:call (ADR 0051)
Subprocess outputYesagent lines do: [:line | ...] — no port handle crosses boundary
Network sockets via actorsYes (same pattern)Actor exposes lines method returning message-send-backed Stream
Direct port/handle across processesNoMaterialize to List first, or use message-send generator pattern

Revised 2026-03-05: The original revision overstated the limitation. Streams can work across process boundaries when the generator uses message sends (gen_server:call) rather than direct resource access (port reads, file handle reads). The key insight from ADR 0051: Subprocess lines returns a Stream whose generator closure calls gen_server:call(ActorPid, {readLine, []}, infinity) — a message send that runs in the caller's process. The actor reads from the port in its own process. No resource handle crosses the boundary, only the actor's PID (which is safe to share). This "message-send generator" pattern restores Stream composability for actor-mediated data sources.

The Insight

Smalltalk's ReadStream/WriteStream (1980) and every modern language since have converged on the same idea: a uniform interface for sequential data. The implementations differ — Smalltalk used mutable position state, Elixir uses closures, Rust uses traits — but the concept is identical: select:, collect:, take:, do: should work on any data source.

Beamtalk's opportunity: implement this idea with modern (closure-based lazy) mechanics while keeping Smalltalk's elegant message-send protocol. Smalltalk's API with Elixir's engine.

Current State

File I/O (stdlib/src/File.bt):

TranscriptStream (stdlib/src/TranscriptStream.bt):

Collections (stdlib/src/List.bt, stdlib/src/Set.bt, etc.):

Constraints

  1. BEAM's I/O model is fundamentally different from Smalltalk's — ports, processes, and message passing rather than synchronous byte streams
  2. Erlang already has robust I/O: file:read_line/1, io:get_line/1, gen_tcp, ssl, and OTP's gen_statem for protocol handling
  3. Elixir's Stream module provides lazy enumeration on BEAM — proven model we can follow
  4. Interactive-first principle — Streams should work naturally in the REPL
  5. Smalltalk heritage — protocol names (select:, collect:, do:, inject:into:) must be preserved

Decision

Introduce Stream as Beamtalk's lazy pipeline for value-side sequential data — a single, closure-based type that unifies collection processing, file I/O, and pure generators under one protocol. Stream covers caller-owned data sources; cross-process data flow uses sync actor methods instead (see Scope Limitation above).

Class Hierarchy

Object
└── Stream (sealed)                ← ONE type for all sequential data

Stream is not abstract — it's the concrete type. Everything that produces sequential data returns a Stream:

// Collections
#(1, 2, 3) stream                  // => Stream over elements
'hello' stream                     // => Stream over characters
#{#a => 1} stream                  // => Stream over Associations

// Files
File lines: 'data.csv'            // => Stream of lines (lazy, constant memory)

// Generators (pure-functional, no process needed)
Stream from: 1                    // => infinite Stream: 1, 2, 3, ...
Stream from: 1 by: [:n | n * 2]  // => infinite Stream: 1, 2, 4, 8, ...

// Stateful generators — use actors (duck-typing or future Behaviours)
fib := FibonacciGenerator spawn   // Actor that speaks Stream protocol
fib take: 10                      // => #(0, 1, 1, 2, 3, 5, 8, 13, 21, 34)

// Actor-mediated Streams — generator uses message sends, not direct port access
// (ADR 0051 "message-send generator" pattern)
agent := Subprocess open: "tail" args: #("-f", "log").
agent lines do: [:line | Transcript show: line]  // Stream backed by readLine calls
agent stderrLines select: [:l | l includesSubstring: "WARN"]

The Universal Protocol

Every Stream responds to the same Smalltalk-named messages. Operations are either lazy (return a new Stream) or terminal (force evaluation and return a result):

// Same pipeline works on ANY data source
countErrors: aStream =>
  s := aStream select: [:line | line includes: 'ERROR']
  s inject: 0 into: [:count :line | count + 1]

countErrors: (File lines: 'app.log')        // file
countErrors: (#('ERROR: x', 'OK', 'ERROR: y') stream)  // collection
countErrors: Console lines                   // stdin (future)

Stream — Lazy Pipelines

The core abstraction. Each operation wraps the previous in a closure — nothing computes until a terminal operation (asList, do:, take:, inject:into:) pulls elements through.

// Lazy — nothing computed yet, just a recipe
s := Stream from: 1
s := s select: [:n | n isEven]
s := s collect: [:n | n * n]
s take: 5
// NOW computes: => #(4, 16, 36, 64, 100)

// From a collection — lazy wrapper, no copy
#(1, 2, 3, 4, 5) stream
  select: [:n | n > 2]
// => Stream (unevaluated)

// Terminal operations force evaluation
(#(1, 2, 3, 4, 5) stream select: [:n | n > 2]) asList
// => #(3, 4, 5)

Key protocol:

MethodTypeDescription
select:LazyFilter elements
collect:LazyTransform elements
reject:LazyInverse filter
take:TerminalFirst N elements as List
drop:LazySkip first N elements
do:TerminalIterate with side effects
inject:into:TerminalFold/reduce
detect:TerminalFirst matching element
asListTerminalMaterialize to List
anySatisfy:TerminalBoolean — any match?
allSatisfy:TerminalBoolean — all match?

Implementation: Closure-based, following Elixir's proven model:

%% Each lazy op wraps previous in a closure
%% Stream internal: #{generator => fun() -> {element, NextFun} | done}
%% select: wraps generator, skipping non-matching elements
%% collect: wraps generator, transforming each element
%% Terminal ops: pull elements until done or limit reached

Error handling — misuse examples:

// Infinite stream + asList = hangs (programmer error, like 1/0)
(Stream from: 1) asList           // ⚠️ Never terminates — use take: first

// Safe: always bound infinite streams
(Stream from: 1) take: 10         // => #(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

// REPL inspection — Stream describes its pipeline, not its data
> s := #(1, 2, 3) stream select: [:n | n > 1]
Stream(select: [...])              // shows structure, not values
> s asList
#(2, 3)                            // terminal forces evaluation

File Streaming

File gains a class method that returns a Stream of lines — no new FileStream class needed:

// Read file lazily — no new class, just File + Stream
(File lines: 'data.csv') do: [:line |
  Transcript show: line
]

// Compose with Stream pipeline
headers := (File lines: 'data.csv') take: 1
data := (File lines: 'data.csv') drop: 1

// Block-scoped for explicit handle management
File open: 'data.csv' do: [:handle |
  (handle lines
    select: [:line | line includes: 'ERROR'])
    do: [:line | Transcript show: line]
]
// handle closed automatically

// Process large files in constant memory
lines := File lines: 'huge.log'
errors := lines select: [:line | line includes: 'ERROR']
errors do: [:line | Transcript show: line]

Implementation: File lines: opens a handle, returns a Stream whose generator calls file:read_line/1. When the stream is exhausted, the handle closes automatically. If the stream is abandoned without being fully consumed, the BEAM's process-linked file handle ensures cleanup when the owning process exits. Block-scoped File open:do: provides explicit lifecycle control for cases where deterministic cleanup matters.

Cross-process constraint: File-backed Streams must be consumed by the same process that created them (BEAM file handles are process-local). To pass file data to an actor, materialize first: (File lines: 'data.csv') take: 100 returns a List that can be sent safely. Collection-backed Streams have no such restriction.

Collection Integration

Collections gain a stream method that returns a lazy Stream:

#(1, 2, 3, 4, 5) stream           // => Stream over list elements
'hello world' stream               // => Stream over characters
#{#a => 1, #b => 2} stream         // => Stream over Associations

This means collections keep their eager do:, collect:, select: for simple cases, but stream provides the lazy pipeline when needed.

Summary

ClassKindUse Case
StreamValue type (sealed)Lazy pipelines — the ONE stream type
FileExistingFile lines: returns a Stream of lines

Prior Art

Smalltalk (Pharo/Squeak)

Erlang

Elixir

Rust

Kotlin

Ruby

Python

User Impact

Newcomer

Smalltalk Developer

Erlang/BEAM Developer

Operator

Steelman Analysis

"Just use Erlang's file module directly via interop" (BEAM developer)

Best argument: Beamtalk already has BEAM interop. Erlang's file module is battle-tested with 30+ years of production use, zero-overhead, and covers every edge case (symlinks, encodings, permissions, large files, memory-mapped I/O). But it's not just about files — you're building an entire lazy evaluation framework on top of BEAM, when Erlang already has list comprehensions and Elixir (available via interop) already has Stream. Every Beamtalk Stream operation adds a closure layer. For a 5-line file processing task, the overhead of creating closures, wrapping generators, and pulling through a pipeline is worse than a simple file:read_line/1 loop. You're optimizing for elegance over the pragmatism that makes BEAM great.

Counter: Closure overhead on BEAM is low (Elixir's Stream module has run in production for 12+ years) though not literally free — for small collections (<1000 elements), eager collection methods will be faster. The key value isn't performance, it's composability: (File lines: 'app.log') select: [:l | l includes: 'ERROR'] in the REPL is one expression. The Erlang equivalent is 5 lines of handle management, pattern matching, and manual cleanup. For an interactive-first language, that matters. Advanced users can always drop to Erlang via interop, and eager collection methods remain the default for small-data cases.

"We should have ReadStream/WriteStream like Smalltalk" (Smalltalk purist)

Best argument: Beamtalk IS a compiler — and parsers are THE classic use case for ReadStream. Sequential consumption with peek (lookahead without consuming) and upTo: (consume until delimiter) are the building blocks of every hand-written parser, tokenizer, and protocol handler. Beamtalk's own lexer does exactly this. Dropping ReadStream means anyone writing a parser in Beamtalk has to reinvent sequential-consumption-with-lookahead on top of lazy pipelines, which is awkward — lazy streams are designed for transformation pipelines, not stateful character-by-character consumption.

WriteStream is equally practical: Beamtalk's codegen builds Core Erlang source by accumulating strings. WriteStream on: String new with nextPutAll: is 50 lines of implementation and covers a real, everyday need. Why force users into inject:into: gymnastics when the simple, proven tool exists?

Both classes are trivial to implement (~100 lines total), carry no design risk, and every Smalltalk developer expects them. The cost of NOT having them is higher than having them.

Counter: The strongest argument in this ADR. Two honest responses: (1) For parsing, Stream can support a peekable wrapper that adds peek and next — Rust does this with Iterator::peekable(). It's a focused addition rather than a whole positional stream hierarchy. (2) For string building, the need is real but it's not a stream — it's a buffer. If the need proves acute, we add StringBuffer as its own focused class, not as WriteStream which conflates output accumulation with sequential data reading. The key principle: don't build two parallel iteration hierarchies (positional + lazy) when one (lazy + focused utilities) suffices.

"One Stream class can't cover everything" (Type theorist)

Best argument: A Stream from #(1,2,3) stream and a Stream from Stream from: 1 are fundamentally different objects wearing the same type. Call asList on the infinite one — your program hangs forever. Call size on a file stream — it reads the entire file just to count lines. Call stream again on a generator — you get a fresh sequence, not a replay. The unified type hides critical failure modes.

In practice, this means: a function that accepts "a Stream" cannot know if it's safe to materialize it. Library authors must document "this only works with finite streams" — which is exactly the type information that should be in the type, not in prose. Rust separates Iterator (pull-based, possibly infinite) from ExactSizeIterator (known length) and Read (I/O bytes) for exactly this reason. One type isn't simplicity — it's lost information.

Counter: Beamtalk is dynamically typed — List can contain integers, strings, and actors in the same list, and nobody complains. The same pragmatism applies to Stream. asList on an infinite stream is a programmer error, like 1/0 — the language doesn't prevent division by zero either. In practice, users know their data source. And take: exists precisely to make infinite streams safe: (Stream from: 1) take: 10 always works. If Behaviours arrive later, we can formalize FiniteStream vs Stream — but building multiple classes now for a dynamically typed language is premature.

"Lazy evaluation is premature — eager is simpler" (Incrementalist)

Best argument: The debugging story is the killer. When a lazy pipeline produces wrong results, where is the bug? In the select:? The collect:? The source generator? With eager evaluation, you inspect each intermediate collection — it exists, it's a real value, you can print it. With lazy evaluation, intermediate values don't exist — they only materialize at the terminal operation. Stack traces point at asList, not at the select: three steps back that had the wrong predicate.

This matters doubly for an interactive-first language. The REPL is your debugger. Beamtalk's whole philosophy is "inspect intermediate values." Lazy evaluation is the opposite — it removes intermediate values. You're undermining your own design principle.

And there's a subtler gotcha: side effects in lazy pipelines run at terminal time, not at definition time. stream collect: [:x | Transcript show: x. x * 2] prints nothing when you define it — it prints when you call asList. For newcomers, this is deeply confusing. Elixir developers learn this the hard way; do we want that learning curve?

Counter: This is the most legitimate objection — and it requires concrete commitments, not hand-waving. Three specific mitigations: (1) Eager collection methods (List select:, List collect:) remain the default for simple cases — most users never need stream. Lazy is opt-in, not forced. (2) In the REPL, terminal operations run immediately (you type s take: 5 and see results), so interactivity is preserved — each temp variable is inspectable. (3) Stream must ship with good printString showing pipeline structure, e.g. Stream(from: 1 | select: [:n | n isEven] | collect: [:n | n * n]). This is a Phase 1 requirement, not a "nice to have." Without it, lazy Streams are opaque in the REPL and the interactive-first principle is violated. The side-effect timing gotcha (lazy side effects run at terminal time) is real and must be documented prominently in Stream's class documentation and the REPL tutorial.

"This creates a confusing parallel to Collection protocol" (API designer)

Best argument: After this ADR, Beamtalk has TWO things that respond to select:, collect:, do:, inject:into: — Collections (eager) and Streams (lazy). Same method names, different semantics. When a newcomer reads code that says things select: [:x | x > 0], they have to check whether things is a List or a Stream to know when the filtering actually happens. When a library accepts "something you can collect: on," does it work with both? Do you document that?

Kotlin has this exact problem: List.filter {} vs Sequence.filter {} — same name, different evaluation strategy. It's a known source of confusion. You're deliberately importing that confusion into Beamtalk.

The cleaner design: make ONE of them primary. Either collections are lazy by default (like Haskell), or streams use different method names (like Elixir's Stream.map vs Enum.map).

Counter: Smalltalk's message-send model resolves this more cleanly than Kotlin. The Kotlin confusion arises because extension functions and type inference hide which type you're calling on — things.filter {} looks identical whether things is a List or Sequence. In Beamtalk, you're always sending a message to a known receiver:

aList select: [:x | x > 0]            // I know this is a List → eager
aList stream select: [:x | x > 0]     // I explicitly opted into Stream → lazy

The opt-in to laziness is visible at the call site — you wrote stream. In the REPL, you can inspect the receiver's class at any time. Polymorphism — same name, different behavior based on receiver — is literally the point of Smalltalk's design. select: on List returns a List. select: on Stream returns a Stream. The receiver IS the boundary, and it's always explicit. Making collections lazy by default (Haskell) would break the simplicity of #(1,2,3) select: [:x | x > 0] returning a List. Using different names (Elixir's Enum.map vs Stream.map) means you can't write generic code that works with both. Same names with explicit opt-in is the right balance — and Smalltalk's paradigm makes it work better than Kotlin's.

Chaining syntax note: Message-send languages (Smalltalk, Newspeak, Beamtalk) have a known limitation where keyword messages cannot chain without parentheses or temporary variables. No satisfying syntax sugar has been found in the Smalltalk literature — the Pharo Sequence framework (IWST 2023) addresses this at the library level but not syntactically. Temporary variables are the pragmatic approach and align with Beamtalk's interactive-first philosophy (each step is inspectable in the REPL). Research into novel pipeline syntax is tracked in BT-506.

Alternatives Considered

Alternative A: Smalltalk ReadStream/WriteStream Hierarchy

Follow Smalltalk's 1980s model: StreamPositionableStreamReadStream, WriteStream, plus FileStream.

// Smalltalk model
stream := ReadStream on: #(1, 2, 3)
stream next      // => 1
stream position  // => 1
stream position: 0  // reset

Rejected because: Positional streams assume random access (position, position:, reset) which doesn't generalize to files, network, or infinite sequences. Modern Pharo is moving away from this model (FileReference replacing FileStream). Mutable position state is un-BEAM-like. Every modern language (Elixir, Rust, Kotlin, Java 8+) converged on lazy pipelines instead.

Alternative B: Elixir Stream Interop Only

Skip building native Beamtalk streams. Use Elixir's Stream and File.stream! via interop.

// Hypothetical interop
lines := Elixir.File streamBang: 'data.csv'

Rejected because: Requires Elixir as a dependency. Syntax becomes awkward (Elixir module calls, not Smalltalk-style message sends). Misses the opportunity for select:, collect:, do: protocol consistency with Beamtalk collections.

Alternative C: Iterator Protocol (Rust/Python model)

Define an Iterable protocol that any object can implement, similar to Rust's Iterator trait or Python's __iter__.

// Hypothetical — requires Behaviours
behaviour Iterable
  next => ...    // returns {value, nextState} or #done
  
// Any class could implement Iterable
Object subclass: Range
  implements: Iterable
  next => ...

Rejected because: Requires language-level protocol/trait support (Behaviours) that Beamtalk doesn't have yet. A concrete Stream class delivers the same user value now. When Behaviours arrive, Stream naturally becomes the reference implementation of an Iterable behaviour — the design is forward-compatible, not locked in.

Alternative D: FileStream as Actor

Wrap file handles in a gen_server (actor) for supervised lifecycle management.

Rejected because: Erlang developers do file I/O inline, not via process wrappers. The BEAM already links file handles to the calling process for auto-cleanup. A gen_server adds ~5μs overhead per call and supervision complexity for no benefit in the common case. Block-scoped File open:do: handles cleanup idiomatically. Users who need actor-wrapped files can build that at the application level.

Alternative E: Do Nothing (Status Quo)

Keep the current state: File readAll: for files, eager collection iteration for data processing. Rely on Erlang interop for anything beyond whole-file reads.

Rejected because: The status quo works for small-data, simple cases — but it's a dead end. Users cannot read large files without loading them into memory. Users cannot compose data processing pipelines. Every new data source (network, stdin, generators) would need its own bespoke iteration pattern. The "do nothing" option is acceptable for 2026 if Beamtalk only targets small scripts, but not if it aims to be a general-purpose language. The investment in Stream pays off across every future I/O feature.

Alternative F: Eager File.lines + Fill Collection Gaps Only

Add File lines: returning a List (eager), plus fill missing select:, collect: on Set/Dictionary/String. No lazy Stream class.

File lines: 'config.txt'    // Returns a List (eager, whole file)
aSet select: [:x | x > 0]   // Now works, returns a Set

Rejected because: Handles the 80% case (small-to-medium files, consistent collection protocol) but closes the door on large-file processing and infinite sequences. If File lines: returns a List, a 1GB log file loads entirely into memory. The incremental cost of lazy Stream is bounded (one new class), while the cost of retrofitting laziness later is high (changing return types is a breaking change). Building Stream now, while the API surface is small, is cheaper than adding it after users depend on eager File lines: returning a List. However, this alternative correctly identifies that Phase 3 (collection stream) is lower priority than Phase 1-2.

Consequences

Positive

Negative

Neutral

Implementation

Phase 1: Stream Core

Phase 2: File Streaming

Phase 3: Collection Integration

Future Phases (separate ADRs/issues)

Restored (revised 2026-03-05): The message-send generator pattern (ADR 0051 Subprocess lines) means these integration points ARE viable as Streams — the generator calls the actor via gen_server:call, no resource handle crosses the process boundary:

Migration Path

Porting Smalltalk ReadStream code

// Smalltalk: ReadStream on: #(1 2 3)
// Beamtalk:
#(1, 2, 3) stream

// Smalltalk: stream next
// Beamtalk: use take: or terminal operations instead of positional next
(#(1, 2, 3) stream) take: 1   // => #(1)

Porting Smalltalk WriteStream code

// Smalltalk: WriteStream on: String new, then nextPutAll:
// Beamtalk: use string concatenation or List join
#('Hello', ', ', 'World') inject: '' into: [:acc :s | acc , s]
// => 'Hello, World'

References