ADR 0012: Collection Literal Syntax and the # Data Literal System

Status

Implemented (2026-02-15)

Context

Beamtalk currently has no way to express list literals in source code. The language has map literals (#{key => value}), symbol literals (#name), and block literals ([body]), but no collection literal for ordered sequences.

This blocks several features:

Current State

Partial implementation exists but is disconnected:

The [...] Overloading Problem

Beamtalk uses [...] for block (closure) literals:

[42]                    // Block returning 42
[:x | x + 1]           // Block with parameter
[self doSomething]      // Block calling a method

Using [1, 2, 3] for lists would create ambiguity:

While commas could disambiguate multi-element cases ([1, 2, 3] vs [expr. expr]), the single-element and empty cases require either trailing commas ([42,]), breaking changes ([] = list), or type-system-level semantic analysis to warn on misuse. This complexity is avoidable.

The # Prefix Already Means "Data"

Beamtalk already uses # as a data literal prefix:

This isn't coincidental — # consistently marks "this is literal data, not executable code." Extending this principle to collection literals creates a coherent, unambiguous design system.

Constraints

  1. Zero ambiguity: No parser disambiguation, no semantic analysis needed to catch misuse
  2. Interactive-first: Must work naturally in REPL sessions
  3. Backward compatible: Existing block syntax [...] must be completely unchanged
  4. Composable: Must work in method arguments, assignments, and pattern matching
  5. Extensible: Design should accommodate future literal types (tuples, regex, binaries)

Decision

Use #(expr, expr, ...) for list literals. The # prefix marks all data literals in Beamtalk, forming a coherent family: #symbol, #{map}, #(list).

Syntax

// List literals
empty := #()                       // Empty list
numbers := #(1, 2, 3)             // List of integers
mixed := #(1, "hello", #ok)       // Heterogeneous list
nested := #(#(1, 2), #(3, 4))    // Nested lists
single := #(42)                   // Single-element list — unambiguous!

// Cons syntax (prepend)
prepended := #(0 | numbers)       // #(0, 1, 2, 3)

// Blocks — completely unchanged
block := [42]                     // Block returning 42
adder := [:x | x + 1]            // Block with parameter
multi := [x := 1. y := 2. x + y] // Multi-statement block

The # Data Literal Family

# followed by a delimiter creates a data literal. The delimiter determines the type:

SyntaxTypeBEAM TypeStatus
#symbolSymbolatom✅ Implemented
#'quoted'Symbolatom✅ Implemented
#{k => v}Dictionarymap✅ Implemented
#(1, 2, 3)Listlinked listThis ADR

The rule: # means data. Bare delimiters mean code.

BareHash-prefixed
(expr) — grouping#(1, 2, 3) — list literal
[body] — block/closure
#{k => v} — map literal

Future Data Literals (Not In This ADR)

This design naturally extends to other literal types. These are not decided here but shown to demonstrate the system's extensibility:

SyntaxTypeBEAM TypePrecedent
{1, 2, 3}Arraytuple (tagged)Erlang {A, B, C}, Smalltalk Array
#/pattern/Regexcompiled regexClojure #"", Ruby %r{}
#r"raw string"Raw stringbinaryPython r"", Rust r""
#b<<1, 2>>BinarybitstringErlang <<1,2>>

Array literals ({1, 2, 3}) require a separate ADR to address the Array vs Tuple distinction — user-created arrays (data structures) should be a different class from raw Erlang tuples (FFI interop results like {ok, Value}). The likely approach is runtime tagging ({beamtalk_array, {...}}) following the same pattern as beamtalk_object, but this has significant implementation implications (~10 files, ~125 lines) that deserve their own design review. See "Future Work" section below.

Each future literal type requires its own ADR when the time comes.

REPL Session

> #(1, 2, 3)
#(1, 2, 3)

> numbers := #(10, 20, 30)
#(10, 20, 30)

> numbers head
10

> numbers tail
#(20, 30)

> #(0 | numbers)
#(0, 10, 20, 30)

> numbers collect: [:n | n * 2]
#(20, 40, 60)

> numbers select: [:n | n > 15]
#(20, 30)

> #()
#()

> #(42) head
42

> obj perform: #add:to: withArgs: #(3, 4)
7

Error Examples

> #(1, 2, 3) value
ERROR: List does not understand 'value'
  Hint: 'value' is a Block method. Lists use head, tail, size, collect:, etc.

> (1, 2, 3)
ERROR: Expected ')' after expression, found ','
  Hint: For a list literal, use #(1, 2, 3) — note the # prefix.

> numbers at: 5
ERROR: Index 5 out of bounds for list of size 3
  Hint: List indices start at 1 and go up to 3

> [1, 2, 3]
ERROR: Unexpected ',' in block body
  Hint: For a list literal, use #(1, 2, 3). Blocks use [body] or [:x | body].

BEAM Mapping

BeamtalkErlangBEAM TypeClass
#(1, 2, 3)[1, 2, 3]Linked list (cons cells)List
#()[]Empty listList
#(h | t)[H | T]Cons cellList
#{a => 1}#{a => 1}MapDictionary
#symbolsymbolAtomSymbol

Prior Art

Smalltalk Family

LanguageLiteral ArrayDynamic ArrayList
Pharo/Squeak#(1 2 3) — compile-time, immutable{1. 2. 1+2} — runtime, mutableOrderedCollection — no literal
NewspeakSimilar to PharoSimilar to PharoNo literal syntax
GNU Smalltalk#(1 2 3){1. 2. 3}No literal syntax

Smalltalk uses #(...) with space-separated elements. Beamtalk modernizes this with comma separation while keeping the # prefix. The continuity is deliberate — Smalltalk developers will recognize #(...) as "collection literal."

BEAM Languages

LanguageList SyntaxTypePattern Match
Erlang[1, 2, 3]Linked list[H | T]
Elixir[1, 2, 3]Linked list[h | t]
Gleam[1, 2, 3]Linked list[first, ..rest]
LFE'(1 2 3) or (list 1 2 3)Linked listPattern match
Beamtalk#(1, 2, 3)Linked list#(head | tail)

Beamtalk's #(...) differs from other BEAM languages' [...] by one prefix character. This is a deliberate trade-off: the # prefix buys zero ambiguity, zero breaking changes, and a coherent literal system.

Languages Using Prefix for Data Literals

LanguagePrefix SystemExamples
Clojure# dispatch#{} set, #"" regex, #inst tagged literal
Elixir~ sigils~r// regex, ~w[] word list, ~c'' charlist
Ruby% sigils%w[] word array, %i[] symbol array, %r{} regex
PythonLetter prefixr"" raw, b"" bytes, f"" format
RustLetter prefixr"" raw, b"" bytes, br"" raw bytes

Beamtalk's # prefix is most similar to Clojure's dispatch character — a single character that says "the next delimiter creates data, not code." The difference: Clojure uses #() for anonymous functions, while Beamtalk keeps [block] for closures and uses #() for lists.

Other Languages with Block/List Ambiguity

LanguageBlocksListsResolution
Ruby{ |x| x + 1 } or do...end[1, 2, 3]Different delimiters
Elixirfn x -> x + 1 end[1, 2, 3]Different syntax entirely
Swift{ x in x + 1 }[1, 2, 3]Different delimiters
Beamtalk[body]#(1, 2, 3)Prefix distinguishes

Most languages avoid the problem by using different delimiters for blocks and lists. Beamtalk achieves this via the # prefix rather than a wholly different bracket type.

User Impact

Newcomer (from Python/JS/Ruby)

#(1, 2, 3) has a small learning cost — one extra character compared to [1, 2, 3]. However, the # prefix is immediately learnable: "hash means data." The comma-separated elements inside are familiar. No trailing-comma gotchas, no ambiguity surprises.

Smalltalk Developer

#(1, 2, 3) will feel natural — it's Smalltalk's #(1 2 3) with commas. The # prefix meaning "literal data" carries over directly. Blocks remain untouched as [body].

Erlang/Elixir Developer

One prefix character different from Erlang's [1, 2, 3]. Maps already use #{} in both languages, so #() for lists is a consistent extension. Pattern matching with #(head | tail) maps cleanly to Erlang's [H | T].

Production Operator

Lists compile to standard Erlang linked lists — fully observable in Observer, compatible with :recon, appear naturally in crash dumps. No wrapper types, no indirection. The #() syntax is purely a source-level convention.

Tooling Developer

Parser is trivial: #( is a two-character token (like #{). No disambiguation logic, no lookahead. LSP can immediately provide list method completions after #(. Error messages are straightforward — if someone writes [1, 2, 3], suggest #(1, 2, 3).

Steelman Analysis

For #(1, 2, 3) — Prefixed Parens (Recommended)

For [1, 2, 3] — BEAM-native

For #[1, 2, 3] — Prefixed Brackets

Tension Points

Concern#(...)[...,]#[...]
AmbiguityNone[42] block vs listNone
Backward compatFull[] changes meaningFull
BEAM familiarityClose (one char)IdenticalClose (one char)
Single-element#(42) — clean[42,] — trailing comma#[42] — clean
Semantic analysis neededNoYes (L effort)No
Extensible design✅ Family: #(), #[], #{}❌ Ad hoc✅ Same family
Smalltalk heritage#(...) is Smalltalk❌ DepartsPartial

The decisive factor: [...,] requires building type inference (2-3 weeks, BT-140) just to provide decent error messages for the [42] block-vs-list confusion. #(...) makes that mistake structurally impossible, freeing the type system work to focus on actual language semantics rather than mitigating syntax ambiguity.

Alternatives Considered

Alternative A: [1, 2, 3] — BEAM-native Brackets

numbers := [1, 2, 3]
empty := []
single := [42,]          // Trailing comma required for single-element

Rejected because:

Alternative B: #[1, 2, 3] — Prefixed Brackets

numbers := #[1, 2, 3]
empty := #[]
single := #[42]           // Unambiguous

Not rejected outright — this is a reasonable alternative. However:

Alternative C: List(1, 2, 3) — Message Send

numbers := List with: 1 with: 2 with: 3

Rejected because:

Consequences

Positive

Negative

Neutral

Implementation

Phase 1: Lexer + Parser (S)

Files:

Phase 2: Codegen + Tests (S)

Files:

Phase 3: Pattern Matching (M)

Phase 4: Stdlib + Runtime Dispatch (M)

Future Work: Array Literals and Tuple/Array Separation

Array literal syntax ({1, 2, 3}) is intentionally deferred from this ADR. The key design question — how to distinguish user-created arrays from raw Erlang tuples — requires its own ADR.

The Problem

Erlang uses tuples for two fundamentally different purposes:

These have different interfaces (arrays want at:, collect:; tuples want isOk, unwrap) but are the same BEAM type (is_tuple/1).

Likely Direction: Runtime Tagging

The preferred approach is tagging user arrays at runtime, following the same pattern as beamtalk_object:

%% User writes {1, 2, 3}, codegen produces:
{beamtalk_array, {1, 2, 3}}          %% Tagged → class 'Array'

%% Erlang FFI returns {ok, 42}:
{ok, 42}                              %% Raw → class 'Tuple'

%% class_of dispatch:
class_of({beamtalk_array, _}) -> 'Array';
class_of(X) when is_tuple(X) -> 'Tuple';

This separates the classes cleanly but has significant implementation implications (~10 files, ~125 lines) including pattern matching, interop boundaries, and display formatting. A dedicated ADR should address:

  1. Syntax: {1, 2, 3} (bare braces) — likely choice since braces have no code use
  2. Runtime representation: tagged vs untagged
  3. Pattern matching: {a, b} := point must handle the tag
  4. Interop: asTuple / asArray conversion methods
  5. Naming: Array (Smalltalk-compatible) vs Tuple (Erlang-compatible)
  6. Impact on existing beamtalk_tuple.erl and beamtalk_primitive.erl

Migration Path

No migration needed — list literals are a new feature.

Documentation updates:

References