ADR 0022: Embedded Compiler via OTP Port (with NIF option)

Status

Implemented (2026-02-15)

Context

Problem Statement

The Beamtalk compiler currently runs as a separate daemon process written in Rust, communicating with the BEAM runtime via JSON-RPC over Unix domain sockets. This architecture creates several pain points:

  1. Windows incompatibility — The daemon uses Unix domain sockets (~/.beamtalk/sessions/<session>/daemon.sock) and Unix-specific lifecycle management (SIGTERM, O_EXCL lockfiles). Windows has no native Unix socket support.

  2. Deployment complexity — Users must install both the beamtalk Rust binary and Erlang/OTP. The daemon must be started before the REPL can compile, adding a process management concern.

  3. Daemon lifecycle fragility — If the daemon crashes, the socket file remains orphaned. Clients get {error, {daemon_unavailable, ...}} errors. Recovery requires manual intervention (beamtalk daemon stop && beamtalk daemon start).

  4. Serialization overhead — Every compilation round-trips through JSON-RPC: Beamtalk source → JSON → Unix socket → JSON parse → compile → JSON encode → Unix socket → JSON parse → Core Erlang string. For REPL interactions this adds measurable latency.

  5. Two-process coordination — The REPL must discover the daemon socket path, handle connection failures, manage timeouts, and deal with protocol version mismatches. This is ~100 lines of IPC code in beamtalk_repl_eval.erl (daemon connection, JSON-RPC encoding, response parsing).

Current Architecture

┌─────────────────────┐         ┌─────────────────────┐
│   beamtalk CLI      │         │   BEAM Node          │
│   (Rust binary)     │         │                      │
│                     │  JSON   │  beamtalk_workspace  │
│  ┌───────────────┐  │  RPC    │  ┌────────────────┐  │
│  │ Compiler      │◄─┼────────┼──│ repl_eval      │  │
│  │ Daemon        │  │  Unix   │  │ (Erlang)       │  │
│  │ (Rust)        │──┼─socket──┼─►│                │  │
│  └───────────────┘  │         │  └────────────────┘  │
│                     │         │                      │
│  ┌───────────────┐  │         │  beamtalk_runtime    │
│  │ beam_compiler │  │ escript │  beamtalk_stdlib     │
│  │ (Core→BEAM)   │──┼────────┼─►(erlc)              │
│  └───────────────┘  │         │                      │
└─────────────────────┘         └──────────────────────┘

The daemon exposes five JSON-RPC methods: compile, compile_expression, diagnostics, ping, and shutdown. The REPL connects via gen_tcp:connect({local, SocketPath}, ...) and sends line-delimited JSON.

Constraints

Decision

Replace the separate compiler daemon with a beamtalk_compiler OTP application that abstracts the compilation backend. Start with OTP Port as the primary backend; add Rustler NIF as an optional high-performance backend later if incremental analysis demands sub-millisecond overhead.

The beamtalk-core crate (lexer, parser, semantic analysis, codegen) will be compiled as a standalone binary invoked via OTP Port, managed by an OTP supervisor. The REPL and build tools will call the compiler through the beamtalk_compiler API instead of JSON-RPC over Unix sockets. A NIF backend can be added behind beamtalk_compiler_backend if latency requirements change.

Rationale for Port-first: The steelman analysis (below) shows that Port solves every stated problem (Windows, daemon lifecycle, deployment) with better fault isolation than NIF. The latency difference (approximately 2 ms versus 0.01 ms) is negligible compared to 10–500 ms compilation times; NIF's sub-millisecond advantage matters only if compilation becomes a keystroke-level hot path, which is not on the current roadmap.

Architecture After

┌──────────────────────────────────────────┐
│              BEAM Node                    │
│                                           │
│  beamtalk_workspace (Live Programming)    │
│  ┌────────────────┐                       │
│  │ repl_eval      │                       │
│  │ (Erlang)       │                       │
│  └───────┬────────┘                       │
│          │ compile_expression/3            │
│          ▼                                │
│  beamtalk_compiler (Anti-Corruption Layer)│
│  ┌────────────────┐  ┌──────────────────┐ │
│  │ compiler_      │  │ beamtalk-core    │ │
│  │ backend.erl    │─►│ (Rust, OTP Port) │ │
│  └────────────────┘  └──────────────────┘ │
│                                           │
│  beamtalk_runtime  (Actor/Object System)  │
│  beamtalk_stdlib   (Standard Library)     │
│                                           │
│  OTP compile module (Core Erlang → BEAM)  │
└───────────────────────────────────────────┘

Compiler API

The beamtalk_compiler module exposes a backend-agnostic API. The implementation dispatches to the configured backend (Port by default, NIF in future Phase 6):

-module(beamtalk_compiler).
-export([compile/2, compile_expression/3, diagnostics/1, version/0]).

%% Compile a file, returning Core Erlang + diagnostics
-spec compile(Source :: binary(), ModuleName :: binary()) ->
    {ok, #{core_erlang := binary(), diagnostics := [map()]}} |
    {error, #{diagnostics := [map()]}}.
compile(Source, ModuleName) ->
    beamtalk_compiler_backend:compile(Source, ModuleName).

%% Compile a REPL expression with known variable bindings
-spec compile_expression(Source :: binary(), ModuleName :: binary(),
                         KnownVars :: [binary()]) ->
    {ok, #{core_erlang := binary(), diagnostics := [map()]}} |
    {error, #{diagnostics := [map()]}}.
compile_expression(Source, ModuleName, KnownVars) ->
    beamtalk_compiler_backend:compile_expression(Source, ModuleName, KnownVars).

%% Get diagnostics only (no codegen)
-spec diagnostics(Source :: binary()) ->
    {ok, [map()]}.
diagnostics(Source) ->
    beamtalk_compiler_backend:diagnostics(Source).

%% Return compiler version
-spec version() -> binary().
version() ->
    beamtalk_compiler_backend:version().

Dirty Scheduler Usage (Future NIF Backend — Phase 6)

If the NIF backend is added in Phase 6, all compilation NIFs will use schedule = "DirtyCpu" to avoid blocking the BEAM scheduler:

#[rustler::nif(schedule = "DirtyCpu")]
fn compile(source: Binary, module_name: Binary) -> NifResult<Term> {
    // ... parse, analyze, codegen ...
}

Compilation typically takes 1–50 ms for REPL expressions and up to several seconds for large files — well beyond the 1 ms NIF budget for normal schedulers. The Port backend avoids this concern entirely since compilation runs in a separate OS process.

Port Wire Format

The Port uses Erlang External Term Format (ETF) over length-prefixed frames ({packet, 4}):

%% Erlang sends a request to the Port
Port = open_port({spawn_executable, CompilerBinary}, [{packet, 4}, binary, exit_status]),
Request = term_to_binary(#{command => compile, source => Source, module => ModuleName}),
port_command(Port, Request),

%% Erlang receives the response
receive
    {Port, {data, Data}} ->
        Response = binary_to_term(Data)
        %% #{status => ok, core_erlang => ..., diagnostics => [...]}
end.

Why ETF over JSON or protobuf:

REPL Integration

beamtalk_repl_eval.erl simplifies from ~200 lines of socket/JSON-RPC code to a direct function call:

%% Before (daemon)
compile_via_daemon(Expression, ModuleName, Bindings, State) ->
    SocketPath = beamtalk_repl_state:get_daemon_socket_path(State),
    case connect_to_daemon(SocketPath) of
        {ok, Socket} ->
            Request = jsx:encode(#{...}),
            gen_tcp:send(Socket, [Request, <<"\n">>]),
            receive_and_parse_response(Socket);
        {error, _} ->
            {error, {daemon_unavailable, SocketPath}}
    end.

%% After (beamtalk_compiler — Port backend)
compile_expression(Expression, ModuleName, Bindings) ->
    KnownVars = [atom_to_binary(V) || V <- maps:keys(Bindings)],
    beamtalk_compiler:compile_expression(Expression, ModuleName, KnownVars).

Precompiled Binaries (OTP Port)

For the OTP Port backend, we distribute the compiler as a standalone executable, not as a NIF shared library. The Erlang node starts this executable via open_port/2.

PlatformArchitectureExecutable artifact
Linux (glibc)x86_64, aarch64beamtalk_compiler_port
Linux (musl)x86_64, aarch64beamtalk_compiler_port
macOSx86_64, aarch64beamtalk_compiler_port
Windowsx86_64beamtalk_compiler_port.exe

CI builds these precompiled Port binaries via a GitHub Actions cross-compilation matrix. Users without a Rust toolchain get the appropriate executable downloaded automatically; if the platform is not covered, the Erlang side falls back to compiling the compiler from source at install time.

NIF backend note (Phase 6): If we later introduce a Rustler-based NIF backend, its .so/.dylib/.dll distribution and any rustler_precompiled usage will be specified in a separate ADR/phase, not here.

Core Erlang → BEAM Compilation

The erlc step (Core Erlang → BEAM bytecode) moves inside the BEAM node, fully in-memory — no temp files, no disk I/O. The Port returns Core Erlang as a binary in the ETF response, which is parsed and compiled directly:

%% Fully in-memory: Core Erlang binary → scan → parse → compile → load
compile_core_to_beam(CoreErlangBin, ModuleName) ->
    {ok, Tokens, _} = core_scan:string(binary_to_list(CoreErlangBin)),
    {ok, Forms} = core_parse:parse(Tokens),
    case compile:forms(Forms, [from_core, binary, return_errors]) of
        {ok, ModuleName, BeamBinary} ->
            code:load_binary(ModuleName, atom_to_list(ModuleName) ++ ".beam", BeamBinary);
        {error, Errors, _Warnings} ->
            {error, Errors}
    end.

This eliminates both the escript subprocess spawn and temporary .core files on disk (ref: BT-48). The entire pipeline is in-memory: Source → Port/ETF → Core Erlang binary → core_scan → core_parse → compile:forms → code:load_binary.

Prior Art

Gleam (Rust compiler, targets BEAM)

Gleam's compiler is a standalone Rust binary that generates Erlang source files. It does not embed into the BEAM — it's a build tool that runs before the BEAM starts. Gleam has no REPL (as of 2026), so the latency of a separate process isn't a concern.

What we learn: A separate Rust compiler works well for batch compilation. But Beamtalk's interactive-first philosophy demands tighter integration for REPL responsiveness.

Elixir + Rustler ecosystem

Many Elixir libraries use Rustler NIFs for CPU-intensive work: explorer (data frames), tokenizers (ML tokenization), html5ever (HTML parsing). These prove the pattern is production-ready at scale.

What we learn: rustler_precompiled solves the distribution problem. Dirty CPU schedulers handle CPU-bound work safely. The pattern is well-established.

Pharo/Squeak (Smalltalk)

The compiler is embedded in the image — parsing and compilation happen inside the VM. This enables the live, interactive development that Beamtalk aspires to.

What we learn: Embedding the compiler is the Smalltalk way. The compiler should be part of the live environment, not external tooling.

LFE (Lisp Flavoured Erlang)

LFE's compiler is written in Erlang and runs inside the BEAM. Compilation is a function call, not an external process. This gives LFE a seamless REPL experience.

What we learn: In-process compilation on BEAM is the natural model. Our Rust compiler needs to cross the Port boundary, but the result should feel the same.

TypeScript (Mainstream — language server architecture)

TypeScript's tsc is a standalone compiler, but tsserver (the language server) embeds the compiler for IDE responsiveness. The language server runs as a separate Node.js process communicating via JSON-RPC — similar to our current daemon. TypeScript considered but rejected in-process embedding for VS Code due to crash isolation concerns.

What we learn: Even mainstream toolchains face the same daemon-vs-embedded trade-off. TypeScript chose process isolation for safety. However, TypeScript's compilation is orders of magnitude heavier than Beamtalk's; the risk calculus is different for a language with sub-100ms REPL compilations.

User Impact

Newcomer

Smalltalk Developer

Erlang/BEAM Developer

Production Operator

Steelman Analysis

Option A: OTP Port (Supervised External Process) — Recommended

Option B: Keep Separate Daemon (Status Quo)

Option C: Embedded Compiler (Rustler NIF) — Future Option (Phase 6)

Tension Points

The core tension is latency vs fault isolation:

NIFPortDaemon
Call overhead~0.01ms~2ms~5-10ms
Typical compilation10-500ms10-500ms10-500ms
User-perceived differenceNoneNoneSlight
Compiler crash impactNode dies (all actors, state, sessions lost)Port restarts (~50ms)Daemon restarts (~500ms)
Windows support✅ .dll✅ stdin/stdout❌ No Unix sockets (TCP workaround)
DeploymentSingle releaseSingle release + binaryTwo components
DebuggabilityHard (NIF in BEAM process)Easy (separate process)Easy (separate process)
Independent upgradesNoNoYes

Key observations:

  1. Latency doesn't differentiate. All three options are dominated by compilation time (10–500 ms). The call overhead difference (0.01 ms vs 2 ms vs 10 ms) is noise.
  2. Fault isolation is the real differentiator. In a production workspace with running actors, a NIF crash is catastrophic. A port crash is a hiccup.
  3. The crypto/ssl comparison is misleading. Those NIFs run ~0.1 ms stateless operations. A compiler NIF runs ~100 ms with complex state. Different risk profile entirely.
  4. Port solves the same deployment problems as NIF — no socket files, no daemon lifecycle, Windows-compatible — without the crash risk.
  5. NIF's only real advantage is if compilation becomes a hot path — e.g., live recompilation on every keystroke for incremental analysis. Today's REPL model (compile on Enter) doesn't need sub-millisecond overhead.

Alternatives Considered

Alternative 1: Keep Separate Daemon (Status Quo)

The daemon works today on Linux and macOS. Windows support could be added with TCP instead of Unix sockets.

Rejected because:

Alternative 1b: Daemon with TCP for Windows (Incremental Fix)

Keep the daemon architecture but replace Unix sockets with TCP, adding named pipe support on Windows. Lowest-risk change.

Not chosen because:

Alternative 2: Rustler NIF (Embedded Compiler)

Embed the Rust compiler directly into the BEAM node as a NIF using Rustler. Lowest possible call overhead (~0.01 ms), native term exchange, single OTP release with no separate binary.

Deferred to Phase 6 because:

Alternative 3: WebAssembly (Compile beamtalk-core to WASM, run in BEAM)

Compile the Rust compiler to WASM and run it via a WASM runtime (wasmex) inside the BEAM.

Rejected because:

Alternative 4: Rewrite Compiler in Erlang/Elixir

Rewrite the compiler (lexer, parser, codegen) in Erlang or Elixir so it runs natively inside the BEAM with zero IPC overhead. This is what LFE does — the compiler is just Erlang modules, compilation is a function call.

Arguments for:

Not chosen because:

Consequences

Positive

Negative

Neutral

Implementation

Phase 0: Wire Check (S)

Prove the core assumption: the beamtalk-core Rust binary can be invoked as an OTP port, receive a Beamtalk expression on stdin, and return Core Erlang on stdout. Minimal viable slice — no backend dispatch, no REPL integration.

Validation criteria:

Affected components:

Phase 1: Port Backend + Anti-Corruption Layer (M)

Create beamtalk_compiler as a new OTP application with OTP Port as the primary backend.

DDD Alignment: The compiler is its own bounded context (Source Analysis + Semantic Analysis + Code Generation). It becomes a fourth OTP application in the umbrella — an Anti-Corruption Layer translating between the Compilation Context and the Live Programming Domain.

beamtalk_workspace  (Live Programming Domain)
    ↓ depends on both
beamtalk_compiler   beamtalk_runtime    ← peers (independent bounded contexts)
(Compilation)       (Actor/Object System)
                        ↓ depends on
                    beamtalk_stdlib     (Standard Library Context)

beamtalk_compiler and beamtalk_runtime are peers, not layered. The compiler has no dependency on the runtime — it compiles Source → Core Erlang without needing actors, objects, or primitives. Error formatting lives in beamtalk_workspace, which depends on both and can combine compiler diagnostics with runtime context. This also enables standalone compilation tools (e.g., beamtalk check) that don't load the runtime.

The workspace asks the compiler to compile; it never knows how compilation happens (Port vs NIF vs daemon). This preserves the Published Language boundary (Core Erlang IR) from the DDD model.

Affected components:

Phase 2: REPL Integration (M)

Replace daemon IPC in beamtalk_repl_eval.erl with calls through beamtalk_compiler_backend.

Affected components:

Testing:

Phase 3: Build Integration (M)

Move beamtalk build to use the Port-based compiler (via an OTP release or escript).

Affected components:

Phase 4: Precompiled Binaries & Windows (L)

Set up CI cross-compilation matrix for the compiler port binary.

Affected components:

Phase 5: Daemon Removal (S)

Remove daemon code after migration period.

Affected components:

Phase 6 (Future): NIF Backend (M, optional)

If incremental analysis or keystroke-level compilation requires sub-millisecond overhead, add Rustler NIF as an alternative backend behind beamtalk_compiler_backend.

Trigger: Port's ~2ms overhead becomes measurable bottleneck in LSP/IDE workflows.

Affected components:

Migration Path

Compiler Backend Selection

During the transition (Phases 2–4), the compiler backend is selectable via environment variable or CLI flag:

%% In beamtalk_compiler_backend.erl (part of beamtalk_compiler app)
%% Compiler-context setting — the workspace depends on beamtalk_compiler
%% but never knows whether compilation uses Port, NIF, or daemon.
compiler_backend() ->
    case os:getenv("BEAMTALK_COMPILER") of
        "daemon" -> daemon;
        "port"   -> port;
        "nif"    -> nif;     %% Phase 6 only
        false    ->
            %% Default changes over time:
            %% Phase 2: daemon (Port opt-in)
            %% Phase 3: port (daemon opt-in)
            application:get_env(beamtalk_compiler, backend, default_backend())
    end.
# Phase 2: Port available but daemon is default
BEAMTALK_COMPILER=port beamtalk repl         # opt-in to Port (workspace-wide)
BEAMTALK_COMPILER=port beamtalk build .      # same env var for build
beamtalk repl                                 # uses daemon (default)

# Phase 3: Port is default, daemon still available
beamtalk repl                                 # uses Port (default)
beamtalk build .                              # uses Port (default)
BEAMTALK_COMPILER=daemon beamtalk repl       # fallback to daemon
beamtalk workspace start --compiler=daemon   # workspace-level flag

# Phase 5: daemon removed
beamtalk repl                                 # Port only
BEAMTALK_COMPILER=daemon beamtalk repl       # warns: "daemon backend removed, using port"

# Phase 6 (future): NIF available as opt-in
BEAMTALK_COMPILER=nif beamtalk repl          # opt-in to NIF for low-latency

The setting lives at the workspace level (not per-REPL-session), so all compilation within a workspace uses the same backend. This allows:

For users

  1. Phase 1-2: Daemon still works and is default. Set BEAMTALK_COMPILER=port to opt in.
  2. Phase 3: Port becomes default. Set BEAMTALK_COMPILER=daemon or --compiler=daemon to fall back.
  3. Phase 5: Daemon removed. Environment variable ignored with deprecation warning.
  4. Phase 6 (future): NIF available as opt-in via BEAMTALK_COMPILER=nif for low-latency workflows.

For the codebase

Implementation Tracking

Epic: BT-543 — Epic: Embedded Compiler via OTP Port (ADR 0022) Progress: 100% complete (8/8 issues done)

PhaseIssueTitleSizeStatus
BaselineBT-544Establish compilation latency baselineS✅ Done
Phase 0BT-545Wire check — OTP Port invokes Rust compiler binaryS✅ Done
Phase 1BT-546beamtalk_compiler OTP app with Port backendM✅ Done
Phase 2BT-547Replace daemon IPC in REPL with beamtalk_compilerM✅ Done
PerfBT-548Validate compilation latency improvementS✅ Done
Phase 3BT-549Move beamtalk build to use Port-based compilerM✅ Done
Phase 4BT-550Release CI: Linux distributable workflowL✅ Done
Phase 5BT-551Remove daemon codeM✅ Done
Phase 6NIF backend (future, optional)M⏳ Deferred

References