ADR 0014: Beamtalk Test Framework — Native Unit Tests and CLI Integration Tests

Status

Implemented (2026-02-15)

Context

Problem Statement

Beamtalk currently has 40 E2E test files (~3,400 lines) in tests/repl-protocol/cases/ that test language features. These tests work by:

Starting a REPL daemon (Erlang node + TCP server)
Sending each expression over TCP to the REPL
Reading back the result and comparing against // => expected comments
Running serially, one expression at a time

This approach has served us well for validating the full compilation pipeline, but it has serious limitations:

Performance: The full E2E suite takes ~90 seconds. Each expression requires a TCP round-trip through the REPL. As the language grows, this will become a bottleneck.

Misclassification: Most E2E tests are actually unit tests for language features (arithmetic, string operations, block semantics) — they don't need the REPL at all. They test that 1 + 2 equals 3, not that the REPL can evaluate 1 + 2.

Missing real E2E coverage: We have no tests for actual end-to-end workflows: beamtalk build, beamtalk repl session management, beamtalk test, workspace lifecycle, CLI argument parsing.

No native test framework: The language features doc (line 1049) describes a TestCase subclass: pattern but there's no implementation. Users can't write tests in Beamtalk itself.

Current State

Layer	Tool	Speed	What it tests
Rust unit tests	`cargo test`	Fast (~5s)	Parser, AST, codegen
Erlang unit tests	`rebar3 eunit`	Fast (~3s)	Runtime, primitives, object system
Compiler snapshot tests	`cargo test`	Fast (~2s)	51 codegen snapshots
E2E tests	`just test-repl-protocol`	Slow (~90s)	Language features via REPL

The testing pyramid is inverted for language feature testing — everything goes through E2E when most could be fast compiled tests.

Constraints

Tests must compile to BEAM bytecode (same pipeline as user code)
Must work with EUnit (Erlang's standard test framework) for CI integration
Must not require a running REPL daemon for pure language tests
Must preserve the existing E2E suite for REPL/workspace integration testing
Should feel idiomatic to Smalltalk developers (SUnit heritage)

Decision

Implement a two-phase test framework with a phased rollout:

Phase 1: Compiled Expression Tests (`beamtalk test`)

Reuse the existing // => assertion format but compile test files directly to EUnit modules — no REPL needed.

Test file format (unchanged from current E2E):

// test/integer_test.bt

// Basic arithmetic
1 + 2
// => 3

5 negated
// => -5

// String operations
'hello' size
// => 5

'hello' , ' world'
// => 'hello world'

What changes: The compiler parses // => comments as test assertions at compile time. The beamtalk test command then generates a thin EUnit wrapper in Erlang source that calls the compiled BEAM modules:

%% Generated EUnit wrapper for test/integer_test.bt
-module(integer_test_tests).
-include_lib("eunit/include/eunit.hrl").

line_3_test() ->
    ?assertEqual(3, beamtalk_integer:dispatch('+', [2], 1)).

line_6_test() ->
    ?assertEqual(-5, beamtalk_integer:dispatch(negated, [], 5)).

line_10_test() ->
    ?assertEqual(5, beamtalk_string:dispatch(size, [], <<"hello">>)).

line_13_test() ->
    ?assertEqual(<<"hello world">>, beamtalk_string:dispatch(',', [<<" world">>], <<"hello">>)).

Note: The wrapper is generated Erlang source (.erl), not Core Erlang. This lets us use EUnit's ?assertEqual macros directly. The actual Beamtalk expressions compile through the normal Core Erlang pipeline; the wrapper just calls the compiled dispatch functions.

CLI command:

$ beamtalk test
Compiling 40 test files...
Running tests...

  integer_test: 12 tests, 12 passed ✓
  string_test: 8 tests, 8 passed ✓
  block_test: 15 tests, 15 passed ✓
  ...

40 files, 287 tests, 287 passed, 0 failed (1.2s)

Key properties:

Same // => format developers already know
Compiles to EUnit — runs in ~1-2 seconds (vs ~90 seconds via REPL)
No REPL daemon needed
Existing test files work with minimal changes
Tests that require workspace bindings remain E2E (need REPL)
Tests with @load directives compile under beamtalk test (no REPL needed for actor tests)

Stateful tests: Tests that use variables across expressions (e.g., counter := Counter spawn then counter increment) compile to a single EUnit test function with sequential statements, preserving variable bindings. Each test file becomes one EUnit test with internal assertions. This matches EUnit's fixture pattern.

REPL session:

> :test test/integer_test.bt
Compiling 1 test file...
12 tests, 12 passed ✓ (0.1s)

Error output when a test fails:

FAIL test/integer_test.bt:3
  Expression: 1 + 2
  Expected:   4
  Actual:     3

Phase 2: SUnit-style TestCase Classes

Add a TestCase base class enabling idiomatic Smalltalk-style test classes:

// test/counter_test.bt
@load test/fixtures/counter.bt

Object subclass: CounterTest

  setUp =>
    self.counter := Counter spawn

  testIncrement =>
    self.counter increment await
    self assert: (self.counter getValue await) equals: 1

  testMultipleIncrements =>
    3 timesRepeat: [self.counter increment await]
    self assert: (self.counter getValue await) equals: 3

  testInitialValue =>
    self assert: (self.counter getValue await) equals: 0

Key properties:

Methods starting with test are auto-discovered
setUp/tearDown lifecycle methods run before/after each test
Assertion methods: assert:, assert:equals:, deny:, should:raise:
Skip methods: skip, skip: — signal {bunit_skip, Reason} caught by the runner as a third outcome
Compiles to EUnit like Phase 1
TestCase is a stdlib class (written in Beamtalk with @primitive methods)

REPL session:

> :load test/counter_test.bt
> CounterTest runAll
3 tests, 3 passed ✓ (0.3s)

> CounterTest run: #testIncrement
1 test, 1 passed ✓ (0.1s)

Error output:

FAIL CounterTest >> testMultipleIncrements
  assert:equals: failed
  Expected: 4
  Actual:   3
  Location: test/counter_test.bt:12

E2E Tests: Kept for Integration

The existing tests/repl-protocol/ suite continues to test REPL-specific behavior:

// tests/repl-protocol/cases/workspace_bindings.btscript
// These NEED the REPL because they test workspace state

Transcript show: 'Hello from E2E'
// => nil

Transcript class
// => TranscriptStream

Criteria for E2E vs compiled test:

Test needs...	Use
Pure language features (arithmetic, strings, blocks)	`beamtalk test` (compiled)
Actor spawning + messaging	`beamtalk test` with `@load`
Workspace bindings (Transcript, Beamtalk)	E2E (needs REPL)
REPL commands (`:load`, `:quit`, `:help`)	E2E (needs REPL)
CLI commands (`beamtalk build`, `beamtalk test`)	E2E / integration test

Bootstrap Constraint: Why Both Phases Are Needed

SUnit-style TestCase classes (Phase 2) cannot be used to test the stdlib primitives they depend on — this creates a circular dependency:

TestCase (stdlib class)
  └── depends on: Object, Integer (comparisons), String (error messages)

Integer tests
  └── depends on: TestCase
  └── ...which depends on: Integer  ← circular!

In Pharo/Squeak this doesn't matter because everything lives in the image with no compilation order. But Beamtalk has separate compilation with OTP app dependencies, so the build order matters.

Resolution: Phase 1 expression tests (// =>) have zero framework dependencies — they compile directly to EUnit with no Beamtalk class imports. This makes them the correct tool for testing stdlib primitives. Phase 2 TestCase classes are for user-level code (actors, application logic) where setUp/tearDown and rich assertions justify the dependency.

Layer	Test with	Why
Primitives (Integer, String, Float, Boolean)	Phase 1 (`// =>`)	No framework dependency — avoids bootstrap circularity
Stdlib classes (Array, Block, Dictionary)	Phase 1 (`// =>`)	Same reason — these are below TestCase in the dependency chain
User classes and actors	Phase 2 (TestCase)	setUp/tearDown, fixtures, rich assertions
Complex integration scenarios	Phase 2 (TestCase)	State management, test grouping

This is not a limitation — it's actually the right tool for each job. Stdlib tests are mostly "does 1 + 2 equal 3?" which is exactly what expression tests excel at.

Prior Art

SUnit (Pharo/Squeak Smalltalk)

The original xUnit framework. Tests are classes inheriting from TestCase with methods prefixed test. Supports setUp/tearDown, assert:equals:, should:raise:.

What we adopt: Class structure, naming conventions, assertion API, skip protocol (skip/skip:). What we adapt: No poolDictionaries or classVariableNames. BEAM process isolation replaces Smalltalk image-level isolation.

EUnit (Erlang)

Erlang's built-in test framework. Test functions end in _test or _test_. Supports ?assertEqual, ?assertMatch macros, test generators, fixtures.

What we adopt: EUnit as the compilation target (Phase 1 and 2 both generate EUnit modules). What we adapt: Beamtalk syntax instead of Erlang macros.

ExUnit (Elixir)

Compile-time test generation via macros. test "name" do ... end blocks, assert macro, setup/setup_all callbacks. Data-driven test generation with for.

What we adopt: The idea of compile-time test generation (our compiler does what Elixir's macro system does). What we adapt: Message-passing assertions instead of macro-based assertions.

Gleam Testing

gleam test compiles test modules, runs each test in its own BEAM process. Simple should.equal(actual, expected) assertions. Process isolation for crash safety.

What we adopt: The beamtalk test CLI pattern, process isolation. What we adapt: SUnit-style assertions instead of function-call assertions.

User Impact

Newcomer (from Python/JS)

Phase 1 is immediately accessible — the // => format is like doctest in Python or inline assertions in tutorials. No new concepts to learn. Phase 2 introduces test classes, familiar from any xUnit framework.

Smalltalk Developer

Phase 2 is exactly what they expect — SUnit is the canonical Smalltalk test framework. TestCase subclass: MyTest with assert:equals: is home territory. Phase 1 is a nice bonus for quick checks.

Erlang/Elixir Developer

Both phases compile to EUnit, which they already know. beamtalk test works like rebar3 eunit or mix test. No new test infrastructure to learn at the BEAM level.

Production Operator

Tests run fast (1-2s compiled vs 90s E2E). CI pipelines are faster. EUnit output integrates with existing CI tools. BEAM process isolation means crashed tests don't take down the suite.

Steelman Analysis

Alternative: Keep Pure E2E (Current Approach)

Cohort	Their strongest argument
🧑‍💻 Newcomer	"The current format is dead simple — I write an expression and the expected result. No classes, no imports, no boilerplate."
🎩 Smalltalk purist	"In Smalltalk, we test in the live image. The REPL is the test environment. Compiling tests separately breaks the interactive-first promise."
⚙️ BEAM veteran	"EUnit already works fine for Erlang. Why add another layer? Just keep testing through the REPL."
🏭 Operator	"90 seconds is acceptable for a CI pipeline. Don't add complexity for marginal speed gains."
🎨 Language designer	"The test format IS the language tutorial format. Keeping them the same means examples are always tested."

Rebuttal: The speed problem will get worse as the language grows. 40 files at 90s means 200 files will take 7+ minutes. And we genuinely lack REPL/CLI integration tests because everything goes through the same slow path.

Alternative: Phase 2 Only (SUnit-style from the Start)

Cohort	Their strongest argument
🧑‍💻 Newcomer	"One test framework to learn, not two. TestCase classes are universal."
🎩 Smalltalk purist	"SUnit IS the Smalltalk way. Skipping it for a simpler format is a disservice to the language's heritage."
⚙️ BEAM veteran	"Class-based tests map directly to EUnit fixtures. setUp/tearDown = EUnit setup. Clean."
🏭 Operator	"One framework, one test command, one CI step. Simplicity."
🎨 Language designer	"TestCase classes use the class system — testing the language WITH the language is dogfooding at its best."

Rebuttal: Phase 2 requires the class system to be more mature (class-side methods, instantiation protocol). Phase 1 works today with zero language additions. Delivering Phase 1 first gives us fast tests immediately while Phase 2 develops.

Tension Points

Newcomers prefer Phase 1's simplicity; Smalltalk purists want Phase 2's heritage
BEAM veterans are happy either way (both compile to EUnit)
Language designers see Phase 1 as pragmatic and Phase 2 as aspirational
The phased approach resolves most tension: deliver simplicity first, heritage second

Alternatives Considered

Alternative A: Optimize Current E2E Runner (Parallel Execution)

Run E2E tests in parallel REPL sessions to reduce wall-clock time.

Why rejected: Treats the symptom (speed) not the cause (architectural mismatch). Most tests don't need a REPL. Parallel REPL sessions add complexity and race condition risks.

Alternative B: Pragma-Based Test Methods (`@test`)

@test 'addition'
(1 + 2) assertEquals: 3

Why rejected: New pragma syntax for something that can be achieved with naming conventions (Phase 2) or existing comment syntax (Phase 1). Adds parser complexity without clear benefit over either phase.

Alternative C: Property-Based Testing First

Focus on property-based testing (QuickCheck/PropEr style) instead of unit tests.

Why rejected: Property-based testing is valuable but requires a more mature type system and standard library. Better as a Phase 3 addition built on top of TestCase.

Consequences

Positive

50-100x faster language feature tests (1-2s compiled vs 90s via REPL)
Correct testing pyramid — unit tests are fast, E2E tests are focused
Native test framework — users can write tests in Beamtalk itself
Dogfooding — Phase 2 exercises the class system, proving it works
CI speed — faster feedback loop for development
Tutorial compatibility — Phase 1 test files ARE tutorials (same // => format)

Negative

Two test formats — Phase 1 (// =>) and Phase 2 (TestCase classes) coexist
Compiler complexity — assertion parsing and EUnit generation are new codegen paths
Migration effort — moving 40 E2E files to compiled tests requires classifying which need REPL
Phase 2 dependency — requires class instantiation protocol (ADR 0013) and setUp/tearDown lifecycle

Neutral

EUnit dependency — we already depend on EUnit for Erlang tests
Test file location — need to decide convention (test/, tests/, alongside source)
Existing E2E tests preserved — no breaking changes to current workflow

Implementation

Phase 1: Compiled Expression Tests

Effort: M (Medium) — ~250-350 lines across 5-7 files

Component	Location	Description
Assertion parser	`crates/beamtalk-core/src/source_analysis/parser/`	Parse `// =>` as `TestAssertion` AST nodes
EUnit codegen	`crates/beamtalk-core/src/codegen/core_erlang/`	Generate EUnit test functions from assertion pairs
`beamtalk test` CLI	`crates/beamtalk-cli/src/commands/test.rs`	Scan dir → compile → run EUnit → format output
Test classifier	`crates/beamtalk-cli/src/commands/test.rs`	Detect workspace binding usage → route to E2E; compile all other tests including `@load`
Output formatter	`crates/beamtalk-cli/src/commands/test.rs`	Parse EUnit output → user-friendly format

Affected layers: Parser (Rust), Codegen (Rust), CLI (Rust), minimal Erlang glue.

Phase 2: SUnit-style TestCase

Effort: XL (Extra Large) — ~650-750 lines across 8-12 files

Component	Location	Description
TestCase class	`stdlib/src/TestCase.bt`	Assertion methods, lifecycle hooks
TestCase runtime	`runtime/apps/beamtalk_runtime/src/`	`beamtalk_test_case.erl` — assertion primitives
Test discovery	`crates/beamtalk-cli/src/commands/test.rs`	Find TestCase subclasses, extract `test*` methods
EUnit bridge	`crates/beamtalk-core/src/codegen/core_erlang/`	Generate EUnit wrappers from TestCase methods
TestResult class	`stdlib/src/TestResult.bt` (optional)	Collect and report test results

Depends on: ADR 0013 (class instantiation protocol — new for value objects), method introspection.

Skip Protocol (BT-1149)

TestCase provides skip and skip: for platform-conditional and environment-conditional tests.

API:

testUnixOnlyFeature =>
  System osFamily = "unix" ifFalse: [^self skip: "Unix only"]
  // ... test body

Protocol:

self skip: reason calls the skip: primitive which throws {bunit_skip, Reason}
run_test_method/4 catches throw:{bunit_skip, Reason} and returns {skip, MethodName, Reason}
structure_results/3 and format_results/2 count skips separately from passes and failures
TestResult gains a skipped field accessible via result skipped
hasPassed returns true when failed = 0, regardless of skip count
Summary: N tests, P passed, S skipped, F failed — skipped only shown when S > 0

Follows SUnit: TestCase>>skip: in Pharo signals TestSkipped exception, same mechanism.

Phase 3: Future Enhancements (Out of Scope)

Property-based testing (PropertyTest class)
Test coverage reporting
Watch mode (beamtalk test --watch)
IDE integration (LSP test discovery)

Migration Path

Moving E2E Tests to Compiled Tests

Classify each tests/repl-protocol/cases/*.btscript file:
- No @load, no workspace bindings → move to test/ (compiled)
- Uses @load but no workspace bindings → move to test/ with @load support
- Uses workspace bindings → keep in tests/repl-protocol/ (needs REPL)
Gradual migration — move files one at a time, verify tests still pass
Expected split:
- ~30 files → compiled tests (pure language features)
- ~10 files → remain E2E (REPL/workspace integration)

Test Directory Convention

test/                    # Compiled Beamtalk tests (Phase 1 + 2)
├── integer_test.bt      # Phase 1: expression tests
├── string_test.bt       # Phase 1: expression tests
├── counter_test.bt      # Phase 2: TestCase class
└── fixtures/
    └── counter.bt       # Shared test fixtures

tests/repl-protocol/               # REPL integration tests (keep existing)
├── cases/
│   ├── workspace_bindings.btscript
│   └── repl_commands.btscript
└── fixtures/
    └── counter.bt

References

Language features doc: docs/beamtalk-language-features.md (lines 1049-1079)
Testing strategy: docs/development/testing-strategy.md
Architecture principles: docs/development/architecture-principles.md (testing pyramid)
Related ADRs: ADR 0007 (stdlib compilation — reusable pattern), ADR 0013 (class instantiation — Phase 2 dependency)
Prior art: SUnit (Pharo), EUnit (Erlang), ExUnit (Elixir)