Blog

How Process Linking and Supervision Really Work in Elixir

1️⃣ What is process linking (at the lowest level)?

Linking = shared fate

When two processes are linked:

spawn_link(fn -> work() end)

or

Process.link(pid)

they form a bidirectional failure relationship.

Rule (very important)

If one linked process crashes, the other also crashes
(unless it is trapping exits)


Case A: Parent and child are linked (default behavior)

parent ─── linked ─── child

Scenario 1: Child crashes

child crashes → exit signal sent → parent crashes

Scenario 2: Parent crashes

parent crashes → exit signal sent → child crashes

📌 Linking is symmetric

This is intentional:

  • No orphan processes
  • Fail fast
  • Fail loudly

2️⃣ What is exit trapping?

Normally, exit signals kill the process.

But a process can say:

Process.flag(:trap_exit, true)

Now exit signals become messages instead of fatal errors.


With trapping enabled

Child crashes → parent receives a message

{:EXIT, child_pid, reason}

instead of dying.

Important distinctions

ScenarioWithout trapWith trap
Linked process crashesYou crashYou get a message
You crashLinked process crashesLinked process crashes
Normal exit (:normal)IgnoredIgnored

📌 Trapping is asymmetric — only the trapping process is protected.


Why trapping exists

To allow:

  • Supervisors
  • Fault monitors
  • Restart logic
  • Controlled recovery

3️⃣ Why you should almost never manually trap exits

Because OTP already solved this with Supervisors.

Manual trapping is:

  • Easy to get wrong
  • Hard to reason about
  • Rarely needed in application code

4️⃣ How Supervisors actually work (important)

A supervisor is just a process that:

  1. Traps exits
  2. Links to children
  3. Restarts them based on a strategy

That’s it.


Supervisor crash behavior

Supervisor does NOT die when child crashes

Because it has:

Process.flag(:trap_exit, true)

So instead of dying, it receives:

{:EXIT, child_pid, reason}

and decides what to do.


5️⃣ Supervisor restart strategies (key concept)

:one_for_one (most common)

child A crashes → restart only child A

:one_for_all

child A crashes → terminate all children → restart all

:rest_for_one

child A crashes → restart A + all started after A

📌 Supervisor never restarts itself
Its parent supervisor does.


6️⃣ What happens if the parent (supervisor) crashes?

Let’s say:

RootSupervisor
  └── AppSupervisor
        ├── Worker A
        └── Worker B

AppSupervisor crashes

  1. RootSupervisor receives exit signal
  2. RootSupervisor restarts AppSupervisor
  3. AppSupervisor starts fresh
  4. AppSupervisor restarts its children (A, B)

📌 Everything below is restarted


7️⃣ What happens to process state on restart?

This is CRITICAL

Process state is LOST on crash

When a process crashes:

  • Heap is destroyed
  • Mailbox is destroyed
  • State is gone

Restart = new process, new PID.


So how does Elixir handle state?

Option 1: Rebuild from source of truth (most common)

init(_) do
  state = load_from_db()
  {:ok, state}
end

Examples:

  • Database
  • ETS
  • External service
  • File
  • Cache

Option 2: Externalize state

Instead of holding state in memory:

  • ETS tables
  • Mnesia
  • Redis
  • Postgres

Workers become stateless coordinators.


Option 3: Event sourcing (advanced)

  • Persist every event
  • Replay on restart
  • Rebuild state deterministically

Used in:

  • Financial systems
  • Workflow engines
  • CRMs
  • Distributed systems

8️⃣ What happens to children when parent restarts?

SituationResult
Child crashesSupervisor decides
Parent crashesChildren die
Parent restartsChildren restart
Child stateLost
Parent stateLost

📌 This is why state must be recoverable


9️⃣ Why this model is actually powerful

Instead of:

  • Locks
  • Try/catch everywhere
  • Defensive programming
  • Zombie threads

Elixir says:

Let it crash → restart cleanly → recover state

This is how:

  • WhatsApp runs millions of processes
  • Telecom systems run for years
  • Fault isolation stays sane

🔑 Final mental model (memorize this)

  1. Links propagate crashes
  2. Trapping converts crashes into messages
  3. Supervisors trap exits
  4. Supervisors restart children
  5. Parents don’t restart themselves
  6. Restarted processes lose state
  7. State must be external or reconstructable

One-sentence summary

Linked processes share fate; trapping converts exits to messages; supervisors trap exits and restart children based on strategy, but restarted processes are brand new and must rebuild any state they previously held.

How useful was this post?

Click on a heart to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.