ADR 0037: Collection Class Hierarchy — Shallow Abstract Layer with Minimal Primitive Surface

Status

Implemented (2026-02-24)

Context

When BT-443 was filed, Beamtalk's collection types (List, Dictionary, Set, Tuple) were completely flat — each an independent Object subclass with its own dispatch/3 module and no shared protocol. This caused duplicated method implementations across types (every type had its own size, isEmpty, includes:, do:, collect:, select:, reject:, inject:into:) and meant that polymorphism across collection types was impossible.

Smalltalk-80 solves this with a deep hierarchy:

Collection (abstract)
  ├── SequenceableCollection (abstract — ordered, indexed)
  │     ├── ArrayedCollection
  │     │     ├── Array
  │     │     └── String
  │     ├── OrderedCollection
  │     └── LinkedList
  ├── Set
  │     └── Dictionary
  └── Bag

This ADR decides how much of that hierarchy Beamtalk should adopt, given the constraints of the BEAM runtime, our sealed primitive types backed by Erlang BIFs, and the v0.1 scope.

Current state (at time of decision)

Constraints

Decision

We adopt a shallow single-layer hierarchy: one abstract class (Collection) directly above five sealed concrete types. No intermediate SequenceableCollection or other abstract subclasses for v0.1. String is a Collection.

Object
  └── Collection (abstract)
        ├── List     (sealed — Erlang linked list)
        ├── Set      (sealed — Erlang ordsets)
        ├── Dictionary (sealed — Erlang map)
        ├── Tuple    (sealed — Erlang tuple)
        └── String   (sealed — Erlang binary, grapheme-cluster elements)

Primitive surface

Collection defines exactly three subclass-responsibility methods. All sealed concrete classes implement them as @primitive:

MethodReason
do: block: Block -> NilBacked by lists:foreach, maps:foreach, etc. — BIF dispatch per type
size -> IntegerBacked by erlang:length, maps:size, ordsets:size, erlang:tuple_size
printString -> StringType-specific formatting ("List(…)", "Set(…)", etc.)

Every other shared collection method is pure Beamtalk on Collection, built on do: and size. This is identical to Pharo's design, where do: is the primitive boundary and all higher-order operations compose on it.

Shared protocol on Collection (pure Beamtalk)

abstract Object subclass: Collection

  size -> Integer => self subclassResponsibility
  do: block: Block -> Nil => self subclassResponsibility
  printString -> String => self subclassResponsibility

  isEmpty -> Boolean => self size =:= 0
  isNotEmpty -> Boolean => self isEmpty not

  includes: element -> Boolean =>
    self do: [:each | each =:= element ifTrue: [^true]].
    false

  inject: initial into: block: Block => @primitive "inject:into:"   // (*)

  collect: block: Block =>
    (self inject: #() into: [:acc :each | acc addFirst: (block value: each)]) reversed

  select: block: Block =>
    (self inject: #() into: [:acc :each |
      (block value: each) ifTrue: [acc addFirst: each] ifFalse: [acc]
    ]) reversed

  reject: block: Block =>
    self select: [:each | (block value: each) not]

  detect: block: Block =>
    self do: [:each | (block value: each) ifTrue: [^each]].
    nil

  detect: block: Block ifNone: noneBlock: Block =>
    self do: [:each | (block value: each) ifTrue: [^each]].
    noneBlock value

  anySatisfy: block: Block -> Boolean =>
    self do: [:each | (block value: each) ifTrue: [^true]].
    false

  allSatisfy: block: Block -> Boolean =>
    self do: [:each | (block value: each) ifFalse: [^false]].
    true

  asString -> String => self printString

(*) inject:into: is kept as @primitive on the abstract class. The pure-BT version using local-variable mutation with do: does not thread state correctly for abstract-class methods in v0.1: the compiler emits lists:foreach (no accumulator) rather than lists:foldl for this pattern. Concrete classes provide their own @primitive overrides anyway, so this only affects user-defined subclasses. See ADR-0034.

Concrete classes keep BIF-backed overrides

Each sealed class retains @primitive overrides for every performance-critical method:

sealed Collection subclass: List
  do: block: Block -> Nil    => @primitive "do:"       // lists:foreach
  size -> Integer            => @primitive "size"      // erlang:length
  collect: block: Block      => @primitive "collect:"  // lists:map
  select: block: Block       => @primitive "select:"   // lists:filter
  inject: initial into: b    => @primitive "inject:into:"  // lists:foldl
  // ... etc.

The pure-BT abstract defaults serve user-defined Collection subclasses that don't override individual methods.

Species Pattern

collect:, select:, and reject: return a collection of the receiver's type, not always List. This is implemented functionally via a class-side withAll: factory on each sealed type:

// On Collection (abstract)
species => self class

collect: block: Block =>
  result := (self inject: #() into: [:acc :each | acc addFirst: (block value: each)]) reversed.
  self species withAll: result

select: block: Block =>
  result := (self inject: #() into: [:acc :each |
    (block value: each) ifTrue: [acc addFirst: each] ifFalse: [acc]
  ]) reversed.
  self species withAll: result

reject: block: Block =>
  self select: [:each | (block value: each) not]

Each sealed type provides a class-side withAll: factory:

Typeclass withAll: listNotes
Listreturns list unchangedidentity
Setdeduplicatesordsets:from_list
Arrayallocates arrayarray:from_list
Stringjoins graphemeslist join
Tupleerlang:list_to_tupleexisting BIF
Dictionaryoverrides collect: entirelyblock receives values; result maps original keys to transformed values

species is overridable — user-defined subclasses can override it to return a different class if needed. The metaclass tower (ADR-0036) makes self class a fully dispatchable first-class object, so self species withAll: is a normal class-side message send.

#(1, 2, 3) collect: [:x | x * 2]           // => #(2, 4, 6)      (List)
((Set new add: 1) add: 2) select: [:x | x > 1]   // => Set(2)     (Set)
#[10, 20, 30] collect: [:x | x + 1]         // => #[11, 21, 31]   (Array)
"hi" collect: [:g | g uppercase]             // => #("H", "I")     (List — String>>collect: returns List since grapheme-mapped results aren't necessarily valid String graphemes)

String exception: String>>collect: returns List rather than String, because the result of mapping over graphemes is not guaranteed to be a valid string of graphemes. String>>select: returns String (filtered graphemes remain valid). String>>withAll: is list join and is used by select:.

Type annotation narrowing with species

Note: This table reflects the implemented state as of BT-822. collect:, select:, and reject: on concrete classes return the receiver's type via self species withAll:. String>>collect: remains -> List (mapped graphemes are not guaranteed valid). String>>select: and reject: return -> String (filtered graphemes are always valid graphemes).

The abstract Collection methods carry the widest safe return type. Concrete classes narrow the return type via covariant override:

Classcollect: return typeselect: return typereject: return type
Collection (abstract)-> Collection-> Collection-> Collection
List-> List-> List-> List
Set-> Set-> Set-> Set
Array-> Array-> Array-> Array
Tuple-> Tuple-> Tuple-> Tuple
String-> List-> String-> String
Dictionary-> Dictionaryn/a (overrides entirely)n/a

Code written against the abstract Collection type gets -> Collection — correct but wide. Code using a concrete type statically (e.g., a List variable) gets the narrower concrete return type. This is covariant return type narrowing: the concrete override is a subtype of the abstract declaration.

String>>collect: is the only exception where the return type is not the receiver's type — mapping graphemes may produce a heterogeneous list, not a valid string. String>>select: and reject: do return String because filtered graphemes are always valid graphemes.

The implementation requires a new Array type (see below) before Array>>withAll: can be provided. Until then, Array is not yet in the hierarchy.

String is a Collection

String is sealed Collection subclass: String. In Smalltalk-80 and Pharo, String is a SequenceableCollection of Character; Beamtalk follows this spirit but with a grapheme-cluster element type rather than a codepoint-integer Character.

Element type: Iteration over a String yields single-grapheme-cluster String values. String>>at: returns a String (the grapheme at that position), not a Character (which is sealed Integer subclass: Character — a codepoint integer). This is the correct semantic for a UTF-8 binary: a grapheme cluster may span multiple codepoints, so the natural atomic unit is a one-grapheme String.

"hello" do: [:g | Transcript show: g]   // g is a String, each a single grapheme
"hello" at: 1                           // => "h"  (a String)
"hello" collect: [:g | g uppercase]     // => #("H", "E", "L", "L", "O")  (a List of Strings)

Resolved implementation gaps: The following changes bring String into full Collection conformance:

ChangeResolution
do:Renamed from each:do: to match Collection protocol
select: return typeRemains -> String — the Erlang primitive returns a binary; filtered graphemes are always valid graphemes. List>>join / join: added for callers who do need a List.
includes:Renamed to includesSubstring: for substring containment; Collection default includes: handles grapheme membership
at: return typeChanged from -> Character to -> String (grapheme-cluster, matching BEAM runtime)

REPL examples

// Polymorphism across concrete types
#(1, 2, 3) size                            // => 3
#{#a => 1, #b => 2} size                   // => 2
((Set new add: 1) add: 2) size             // => 2
"hello" size                               // => 5

// Shared abstract protocol works on all types
#(1, 2, 3) isEmpty                         // => false
#(1, 2, 3) includes: 2                     // => true
#(1, 2, 3) collect: [:x | x * 2]          // => #(2, 4, 6)
#{#a => 1, #b => 2} collect: [:v | v * 2] // => #{#a => 2, #b => 4}  (Dictionary — maps values, preserving keys)
"hi" collect: [:g | g uppercase]           // => #("H", "I")

// User-defined Collection subclass inherits defaults
Collection subclass: Range
  from: start: Integer to: end: Integer => self new init: start end: end

  init: start end: end =>
    self start := start.
    self end := end

  size -> Integer => self end - self start + 1

  do: block: Block -> Nil =>
    i := self start.
    [i <= self end] whileTrue: [
      block value: i.
      i := i + 1
    ]

  printString -> String => "Range({self start}..{self end})"

r := Range from: 1 to: 5.
r size             // => 5
r isEmpty          // => false
r includes: 3      // => true
r collect: [:x | x * x]  // => #(1, 4, 9, 16, 25)
r select: [:x | x isOdd] // => #(1, 3, 5)
r inject: 0 into: [:sum :x | sum + x]  // => 15  (works via to_list → do: dispatch)

Note on inject:into: for user-defined subclasses: The abstract @primitive inject:into: on Collection works generically for any subclass that implements do:. The runtime helper beamtalk_collection_ops:inject_into/3 converts the receiver to a list by dispatching do: on the object, then folds over the result. User-defined subclasses only need to override inject:into: if they require custom accumulation semantics (e.g., Dictionary's key-preserving fold over values).

Error examples

// Attempting to instantiate the abstract class
Collection new
// => Error: Collection is abstract and cannot be instantiated

// Missing do: override in a Collection subclass
Collection subclass: BadCollection
  size -> Integer => 3
  printString -> String => "BadCollection"

BadCollection new do: [:x | x]
// => Error: subclassResponsibility — Collection>>do: not overridden

Prior Art

Pharo / Squeak Smalltalk

Pharo's Collection is the gold standard: abstract class with do: as the sole primitive, all higher-order operations (collect:, select:, reject:, detect:, inject:into:) as pure Smalltalk building on do:. The species pattern (species returns the result class, copyEmpty creates the accumulator) handles the question "what type does collect: return on a Set?". String is a SequenceableCollection of Character.

What we adopt: do: as the primitive boundary; pure-BT higher-order operations on the abstract class.

What we adapt: The species pattern — collect:, select:, reject: returning the receiver's type — is adopted but implemented functionally rather than with Pharo's mutation-based copyEmpty + add:. Each sealed type provides a class-side withAll: factory; Collection>>species returns self class; the abstract implementations end with self species withAll: result. See Species Pattern section below.

What we defer: SequenceableCollection. Pharo's intermediate layer requires at: as a shared protocol with consistent performance contracts across types — deferred to a future ADR.

Newspeak

Minimises the VM surface to allocation, dispatch, and I/O. Collections are entirely self-hosted. No separate abstract layer for "sequenceable" — the hierarchy is flatter than Pharo's.

What we adopt: The aspiration of minimising the Erlang primitive surface to a fixed interface.

Gleam

Gleam's list module uses @external(erlang, ...) only for length, reverse, flatten. All higher-order functions (map, filter, fold, any, all) are pure Gleam using the prepend-then-reverse accumulator pattern ([element | acc] + lists.reverse). There is no collection hierarchy — each type has its own module.

What we adopt: The O(1) prepend + reversed pattern for accumulator-based operations. addFirst: (BT-814) serves the same role as [H|T] in Gleam.

What we consciously diverge from: Gleam's flat per-module model. We add the abstract Collection layer to enable polymorphism and default implementations — a feature Gleam doesn't have because it targets both Erlang and JavaScript.

Elixir

Enumerable is a protocol, not an abstract class. Any type can declare Enumerable conformance by implementing three callbacks. Enum module functions work on anything implementing Enumerable. There is no inheritance relationship between List, Map, MapSet.

What we observe: The protocol approach is more flexible than inheritance for adding enumerable behaviour to types that don't share an ancestor. This is relevant to ADR-0025 (Gradual Typing). For v0.1, single-inheritance from Collection is simpler and sufficient.

Possible future direction: If ADR-0025 delivers typed protocols, Enumerable as a protocol could supersede or complement the Collection abstract class. The current design doesn't preclude this.

User Impact

Newcomer (coming from Python/Ruby/JS): The Collection abstract class is invisible day-to-day. Users work with List, Set, Dictionary, Tuple directly. The benefit surfaces when they write a custom container: it inherits working collect:, select:, detect:, etc. for free. Discoverability: Collection class in the REPL shows the abstract API; Collection subclasses lists all registered subclasses.

Smalltalk developer: The Collectiondo: → higher-order operations pattern is canonical Smalltalk. The absence of SequenceableCollection will be noticed, but the pattern is the same. String as a Collection is consistent with Pharo, though iterating grapheme-cluster Strings rather than Character codepoints is a departure. The species pattern (collect: on a Set returns a Set, etc.) matches Pharo's design; String>>collect: returns List rather than String (since mapped graphemes aren't guaranteed valid). The Tuple inclusion as a Collection is unusual but follows naturally from BEAM tuple semantics.

Erlang/BEAM developer: List collect: still compiles to lists:map/2 — the BIF path is unchanged. The abstract defaults only activate for non-standard collection subclasses. Stack traces for standard types are unaffected. Tuple as a Collection maps to erlang:tuple_to_list + iteration, which is familiar from Erlang's own tuple_to_list/1.

Production operator: No change to runtime behavior for existing code. Abstract Collection methods add one message-send layer only for user-defined subclasses, not for List/Set/Dictionary/Tuple. Memory overhead: zero — Collection is a class object, not a per-instance allocation.

Tooling developer: Collection as a declared abstract type enables LSP to show the complete shared protocol for any collection type through superclass lookup. printString as abstract means the compiler and LSP can verify all Collection subclasses implement it.

Steelman Analysis

Option B: Full Smalltalk Hierarchy (SequenceableCollection etc.)

Why rejected for v0.1: List (O(n) at:), Tuple (O(1) at:), and String (O(n) grapheme at:) have fundamentally different performance contracts for indexed access — grouping them under a shared SequenceableCollection with at: would misrepresent these contracts. Tuple is indexed but immutable — it doesn't need OrderedCollection semantics. The complexity cost outweighs the benefit at v0.1 scope. Revisit when a shared ordered-type protocol is needed.

Option C: Traits / Protocol (no abstract class)

Why rejected for v0.1: ADR-0025 (Gradual Typing) is Accepted but not implemented. We can't use typed protocols as the primary mechanism for shared collection behaviour until the protocol system exists. The current abstract class design is a valid starting point that doesn't preclude an Enumerable protocol being layered on later.

Option D: Flat — No Shared Abstract Class

Why rejected: User-defined collection subclasses need default implementations. Without Collection, every custom container must implement collect:, select:, reject:, etc. from scratch. This was the status quo before BT-815 and it was a known gap.

Tension Points

Alternatives Considered

SequenceableCollection between Collection and List/Tuple

A SequenceableCollection abstract class could group ordered, indexed types (List, Tuple) and share at:, first, last, from:to:, reversed, eachWithIndex: at that level.

Rejected for v0.1. The fundamental problem is the at: performance contract:

Typeat: complexityBacking structure
TupleO(1)Erlang tuple — contiguous memory, element/2 is pointer arithmetic
ListO(n)Erlang linked list — lists:nth/2 walks n nodes
StringO(n)Erlang binary — grapheme clusters are variable-width UTF-8, indexing requires scanning from the start

There is no BEAM primitive that gives O(1) dynamic random access to a variable-length sequence. The array module provides O(log n) access via a sparse tree, but there is no Beamtalk type backed by it yet. Erlang's own idiom is to avoid index-based access on lists entirely.

A SequenceableCollection abstract class with at: would encode a false uniformity: code written against the abstraction might loop 1 to: coll size do: [:i | coll at: i] assuming reasonable random access, producing O(n²) on List and String. In Pharo this problem is less acute because Array (O(1)) is the primary sequence type and OrderedCollection wraps it — SequenceableCollection>>at: is O(1) for all practical Pharo types.

Methods that could be safely shared (first, last, reversed) are too thin to justify the layer. A well-designed SequenceableCollection would need either: (a) no at: in the shared contract, or (b) a new Vector/Array type backed by the Erlang array module as the primary SequenceableCollection leaf — leaving List under Collection with no indexed-access protocol. Either path is a design decision in itself. Deferred pending that analysis.

String as a separate SequenceableCollection branch

Rather than putting String directly under Collection, it could be placed under a SequenceableCollection along with List and Tuple, since all three support at:, first/last, indexed access. Rejected for v0.1 for the same reasons SequenceableCollection is deferred generally — List (O(n) at:) and Tuple (O(1) at:) have different performance contracts that SequenceableCollection would misrepresent. Placing String directly under Collection is the consistent v0.1 choice.

No species pattern — always return List from accumulator operations

Pharo's species method returns the appropriate result class so aSet collect: [...] returns a Set, not an Array. An alternative is to always return List from abstract accumulator operations. Arguments for this simpler approach: (a) implementation is trivial — no constructor protocol needed, (b) the most common use case is transforming to a list anyway.

Rejected in favour of the species pattern (BT-822): the semantics are genuinely surprising to Smalltalk users ("I filtered a Set, why do I get a List?") and the constructor protocol (withAll:) is straightforward for the sealed concrete types. The metaclass tower (ADR-0036) makes self species withAll: a normal class-side message send. The species pattern is implemented in the abstract class in terms of self species withAll:, with each sealed type providing the factory. Until BT-822 is complete, the fallback is to return List.

Consequences

Positive

Negative

Neutral

Implementation

This ADR documents decisions that are already implemented as of 2026-02-23. The work was done across multiple issues tracked under ADR-0034:

PhaseIssueStatusDescription
Hierarchy structureBT-443 (this ADR)✅ AcceptedDesign decision documented
addFirst: primitiveBT-814✅ ImplementedO(1) list cons for accumulator pattern
Abstract Collection protocolBT-815✅ Implemented10 pure-BT methods on Collection.bt
Number.bt precedentBT-334✅ ImplementedAbstract numeric superclass pattern

All ADR-0034 implementation work is now complete:

PhaseIssueStatusDescription
Phase 1BT-813✅ DoneFuture.bt and FileHandle.bt as @primitive stubs
Phase 2aBT-814✅ DoneaddFirst: O(1) list cons primitive
Phase 2bBT-815✅ DoneAbstract Collection protocol in pure Beamtalk
Phase 3BT-816✅ DoneSelf-host List algorithmic ops (indexOf:, eachWithIndex:)
Phase 3BT-817✅ DoneSelf-host Tuple unwrap ops
Phase 3BT-818✅ DoneSelf-host Dictionary keysAndValuesDo: and Association formatting
Phase 4BT-819✅ DoneSelf-host TestCase assertions

Resolved gaps addressed by this ADR:

TopicStatusResolution
String>>do: primitive✅ DoneRenamed each:do:
String>>select: return type✅ DoneRemains -> String; List>>join / join: added for callers who need a List
String>>includes: override✅ DoneRenamed to includesSubstring:; Collection default handles grapheme membership

Future design work deferred by this ADR:

TopicWhen to revisitPrerequisite
Species pattern + Array type✅ Done — BT-822Array type (#[...] syntax), species pattern, withAll: on all sealed types
SequenceableCollectionAfter Array is addedArray (O(log n) at:) + Tuple (O(1) at:) + String (O(n) at:) gives three meaningful subtypes; at: performance contracts still diverge so careful design needed
Enumerable as typed protocolAfter ADR-0025 (Gradual Typing) is implementedADR-0025 implementation

Migration Path

This ADR introduces three breaking changes to String:

Old APINew APISearch/replace
includes: (substring test)includesSubstring:includes:includesSubstring: on String receivers
each: (iteration)do:each:do: on String receivers
at: returns Characterat: returns StringCode expecting a Character (integer) from String>>at: must adapt to receiving a single-grapheme String

Since Beamtalk is pre-v0.1, no deprecation period or codemod is provided. The renames were applied to all existing stdlib tests and examples as part of this ADR.

References