1️⃣ What is process linking (at the lowest level)?
Linking = shared fate
When two processes are linked:
spawn_link(fn -> work() end)
or
Process.link(pid)
they form a bidirectional failure relationship.
Rule (very important)
If one linked process crashes, the other also crashes
(unless it is trapping exits)
Case A: Parent and child are linked (default behavior)
parent ─── linked ─── child
Scenario 1: Child crashes
child crashes → exit signal sent → parent crashes
Scenario 2: Parent crashes
parent crashes → exit signal sent → child crashes
📌 Linking is symmetric
This is intentional:
- No orphan processes
- Fail fast
- Fail loudly
2️⃣ What is exit trapping?
Normally, exit signals kill the process.
But a process can say:
Process.flag(:trap_exit, true)
Now exit signals become messages instead of fatal errors.
With trapping enabled
Child crashes → parent receives a message
{:EXIT, child_pid, reason}
instead of dying.
Important distinctions
| Scenario | Without trap | With trap |
|---|---|---|
| Linked process crashes | You crash | You get a message |
| You crash | Linked process crashes | Linked process crashes |
Normal exit (:normal) | Ignored | Ignored |
📌 Trapping is asymmetric — only the trapping process is protected.
Why trapping exists
To allow:
- Supervisors
- Fault monitors
- Restart logic
- Controlled recovery
3️⃣ Why you should almost never manually trap exits
Because OTP already solved this with Supervisors.
Manual trapping is:
- Easy to get wrong
- Hard to reason about
- Rarely needed in application code
4️⃣ How Supervisors actually work (important)
A supervisor is just a process that:
- Traps exits
- Links to children
- Restarts them based on a strategy
That’s it.
Supervisor crash behavior
Supervisor does NOT die when child crashes
Because it has:
Process.flag(:trap_exit, true)
So instead of dying, it receives:
{:EXIT, child_pid, reason}
and decides what to do.
5️⃣ Supervisor restart strategies (key concept)
:one_for_one (most common)
child A crashes → restart only child A
:one_for_all
child A crashes → terminate all children → restart all
:rest_for_one
child A crashes → restart A + all started after A
📌 Supervisor never restarts itself
Its parent supervisor does.
6️⃣ What happens if the parent (supervisor) crashes?
Let’s say:
RootSupervisor
└── AppSupervisor
├── Worker A
└── Worker B
AppSupervisor crashes
- RootSupervisor receives exit signal
- RootSupervisor restarts AppSupervisor
- AppSupervisor starts fresh
- AppSupervisor restarts its children (A, B)
📌 Everything below is restarted
7️⃣ What happens to process state on restart?
This is CRITICAL
Process state is LOST on crash
When a process crashes:
- Heap is destroyed
- Mailbox is destroyed
- State is gone
Restart = new process, new PID.
So how does Elixir handle state?
Option 1: Rebuild from source of truth (most common)
init(_) do
state = load_from_db()
{:ok, state}
end
Examples:
- Database
- ETS
- External service
- File
- Cache
Option 2: Externalize state
Instead of holding state in memory:
- ETS tables
- Mnesia
- Redis
- Postgres
Workers become stateless coordinators.
Option 3: Event sourcing (advanced)
- Persist every event
- Replay on restart
- Rebuild state deterministically
Used in:
- Financial systems
- Workflow engines
- CRMs
- Distributed systems
8️⃣ What happens to children when parent restarts?
| Situation | Result |
|---|---|
| Child crashes | Supervisor decides |
| Parent crashes | Children die |
| Parent restarts | Children restart |
| Child state | Lost |
| Parent state | Lost |
📌 This is why state must be recoverable
9️⃣ Why this model is actually powerful
Instead of:
- Locks
- Try/catch everywhere
- Defensive programming
- Zombie threads
Elixir says:
Let it crash → restart cleanly → recover state
This is how:
- WhatsApp runs millions of processes
- Telecom systems run for years
- Fault isolation stays sane
🔑 Final mental model (memorize this)
- Links propagate crashes
- Trapping converts crashes into messages
- Supervisors trap exits
- Supervisors restart children
- Parents don’t restart themselves
- Restarted processes lose state
- State must be external or reconstructable
One-sentence summary
Linked processes share fate; trapping converts exits to messages; supervisors trap exits and restart children based on strategy, but restarted processes are brand new and must rebuild any state they previously held.