OTP Supervisors
Beamtalk actors are OTP gen_servers. When an actor crashes, you want the system to restart it automatically — that's what supervisors are for.
A supervisor watches a set of actor processes. When a child crashes, the supervisor applies a restart strategy: restart just that child, or restart all of them, depending on the policy.
This chapter covers:
Supervisor subclass:— static child list, known at start-up- Restart strategies and policies
DynamicSupervisor subclass:— add children at runtime- Supervision in practice: a resilient counter service
Static supervisors
A static supervisor knows its children at start-up time. Subclass Supervisor
and override class children to return the list of actor classes to supervise:
TestCase subclass: Ch17StaticSupervisor
testSupervisorClass =>
// Supervisor subclass: defines a supervision tree.
// class children => returns the list of actors to supervise.
self assert: CounterApp strategy equals: #oneForOne
self assert: CounterApp children size equals: 1
testSupervisorIsSupervisorFlag =>
self assert: CounterApp isSupervisor
self deny: Counter isSupervisor
The CounterApp supervisor is defined like this (not run as a doctest —
it requires an OTP application environment):
Actor subclass: Counter
state: value = 0
increment => self.value := self.value + 1
value => self.value
Supervisor subclass: CounterApp
class children => #[Counter supervisionSpec]
class children returns an array of SupervisionSpec values.
The simplest spec is SomeActorClass supervisionSpec, which uses
the actor's default restart policy (#temporary).
Supervision specs
A SupervisionSpec describes how to start one supervised child. It is
built from an actor class using fluent setter methods:
TestCase subclass: Ch17SupervisionSpecs
testDefaultSpecHasTemporaryRestart =>
spec := Counter supervisionSpec
self assert: spec restart equals: #temporary
testCustomRestartPolicy =>
spec := Counter supervisionSpec withRestart: #permanent
self assert: spec restart equals: #permanent
testCustomId =>
spec := Counter supervisionSpec withId: #mainCounter
self assert: spec id equals: #mainCounter
testChainedBuilders =>
spec := Counter supervisionSpec
withId: #primary
withRestart: #permanent
self assert: spec id equals: #primary
self assert: spec restart equals: #permanent
Actor restart policies
Each actor class can declare its own default restart policy. Override
class supervisionPolicy to change it:
TestCase subclass: Ch17RestartPolicy
testDefaultPolicyIsTemporary =>
// Actors default to #temporary — not restarted on crash
self assert: Counter supervisionPolicy equals: #temporary
testSpecInheritsActorPolicy =>
// supervisionSpec picks up the actor's policy automatically
spec := Counter supervisionSpec
self assert: spec restart equals: #temporary
Restart policies:
| Policy | Meaning |
|---|---|
#temporary | Never restarted (default) |
#transient | Restarted on abnormal termination only |
#permanent | Always restarted |
To make a worker always restart, override class supervisionPolicy in
the actor class (not shown as a doctest — overriding class methods
requires a class definition):
Actor subclass: PersistentWorker
class supervisionPolicy => #permanent
state: value = 0
increment => self.value := self.value + 1
Restart strategies
Override class strategy on your supervisor to change the strategy:
| Strategy | Meaning |
|---|---|
#oneForOne | Only restart the crashed child (default) |
#oneForAll | Restart all children when one crashes |
#restForOne | Restart the crashed child and all children started after it |
TestCase subclass: Ch17Strategies
testDefaultStrategyIsOneForOne =>
self assert: CounterApp strategy equals: #oneForOne
Dynamic supervisors
A DynamicSupervisor starts with no children. You add children at
runtime as demand requires. This is ideal for per-request or per-connection
workers.
Subclass DynamicSupervisor and override class childClass:
TestCase subclass: Ch17DynamicSupervisor
testDynamicSupervisorIsSupervisorFlag =>
// DynamicSupervisor subclasses are also supervisors
self assert: WorkerPool isSupervisor
testDynamicSupervisorHasNoStaticChildren =>
// Dynamic supervisors don't define class children
self assert: WorkerPool childClass equals: Counter
A WorkerPool is defined like this:
DynamicSupervisor(Counter) subclass: WorkerPool
class childClass => Counter
At runtime, call startChild / startChild: on a running supervisor
instance to add workers:
// pool := WorkerPool supervise // start the supervisor
// pool startChild // add a Counter child
// pool count // => 1
// pool startChild // add another
// pool count // => 2
Supervisor lifecycle
TestCase subclass: Ch17Lifecycle
testSupervisorClassMethods =>
// supervise — starts the supervisor as an OTP process
// current — retrieves a running supervisor
// isSupervisor — true for all Supervisor subclasses
self assert: CounterApp isSupervisor
self assert: CounterApp strategy equals: #oneForOne
self assert: (CounterApp children size) equals: 1
Key class-side API:
| Message | Returns | Description |
|---|---|---|
MyApp supervise | supervisor pid | Start the supervision tree |
MyApp current | supervisor pid | Get the running instance |
MyApp isSupervisor | Boolean | Always true for supervisors |
MyApp strategy | Symbol | Restart strategy |
MyApp children | Array | Static child specs |
Key instance-side API:
| Message | Description |
|---|---|
sup children | List running children |
sup which: Counter | Find a specific child |
sup terminate: Counter | Stop a specific child |
sup count | Count running children |
sup stop | Shut down the supervisor and all children |
Graceful shutdown timeout
By default, workers get 5000ms to shut down and nested supervisors get
unlimited time. Use withShutdown: to override the timeout (in milliseconds)
for children that need time to drain connections or flush state:
HttpServer supervisionSpec withShutdown: 30000 // 30s graceful shutdown
Summary
Static supervision tree:
Supervisor subclass: MyApp
class strategy => #oneForOne // optional, default
class children => #[
SomeActor supervisionSpec,
OtherActor supervisionSpec withRestart: #permanent
]
Dynamic supervision tree:
DynamicSupervisor(Counter) subclass: WorkerPool
class childClass => Counter
// pool := WorkerPool supervise
// pool startChild // spawn a new Counter
// pool startChild: args // spawn with arguments
// pool count // how many children
Restart policies:
#temporary never restarted (default)
#transient restarted on abnormal exit only
#permanent always restarted
Strategies:
#oneForOne restart only the crashed child (default)
#oneForAll restart all children when any crashes
#restForOne restart crashed child + all after it
Exercises
1. Default restart policy. What restart policy does a new actor class have by default? How do you check it?
Hint
Counter supervisionPolicy // => #temporary
All actors default to #temporary — they are never automatically restarted.
Override class supervisionPolicy => #permanent to change this.
2. Custom supervision spec. Create a supervision spec for Counter with
a #permanent restart policy and a custom ID of #mainCounter. Chain the
builder methods.
Hint
spec := Counter supervisionSpec
withId: #mainCounter
withRestart: #permanent
spec id // => #mainCounter
spec restart // => #permanent
3. Strategy choice. When would you choose #oneForAll over #oneForOne?
Give a concrete example.
Hint
Use #oneForAll when children depend on each other and can't function
independently. Example: a database connection pool and a cache actor — if the
pool crashes, the cache holds stale connections and must also restart.
#oneForOne (default) is best when children are independent, like multiple
worker processes handling separate requests.
Next: Chapter 18 — File I/O