
Hacker News · Mar 2, 2026 · Collected from RSS
Article URL: https://variantsystems.io/blog/beam-otp-process-concurrency Comments URL: https://news.ycombinator.com/item?id=47214063 Points: 16 # Comments: 4
Every few months, someone in the AI or distributed systems space announces a new framework for running concurrent, stateful agents. It has isolated state. Message passing. A supervisor that restarts things when they fail. The BEAM languages communities watch, nod, and go back to work.This keeps happening because process-based concurrency solves a genuinely hard problem, and the BEAM virtual machine has been solving it since 1986. Not as a library. Not as a pattern you adopt. As the runtime itself.Dillon Mulroy put it plainly:Thirty thousand people saw that and a lot of them felt it. The Python AI ecosystem is building agent frameworks that independently converge on the same architecture — isolated processes, message passing, supervision hierarchies, fault recovery. The patterns aren’t similar to OTP by coincidence. They’re similar because the problem demands this shape.This post isn’t the hot take about why Erlang was right. It’s the guide underneath that take. We’ll start from first principles — what concurrency actually means, why shared state breaks everything, and how processes change the game. By the end, you’ll understand why OTP’s patterns keep getting reinvented and why the BEAM runtime makes them work in ways other platforms can’t fully replicate.We write Elixir professionally. Our largest production system — a healthcare SaaS platform — runs on 80,000+ lines of Elixir handling real-time scheduling, AI-powered clinical documentation, and background job orchestration. This isn’t theoretical for us. But we’ll explain it like we’re explaining it to ourselves when we first encountered it.The concurrency problem, stated plainlyYour program needs to do multiple things at once. Maybe it’s handling thousands of web requests simultaneously. Maybe it’s running AI agents that each maintain their own conversation state. Maybe it’s processing audio transcriptions while serving a real-time dashboard.The hardware can do this. Modern CPUs have multiple cores. The question is how your programming model lets you use them.There are two fundamental approaches.Shared state with locks. Multiple threads access the same memory. You prevent corruption with mutexes, semaphores, and locks. This is what most languages do — Java, C++, Go (with goroutines, but shared memory is still the default model), Python (with the GIL making it worse), Rust (with the borrow checker making it safer).The problem with shared state isn’t that it doesn’t work. It’s that it works until it doesn’t. Race conditions are the hardest bugs to reproduce, the hardest to test for, and the hardest to reason about. The more concurrent your system gets, the more lock contention slows everything down. And a single corrupted piece of shared memory can cascade through the entire system.Isolated state with message passing. Each concurrent unit has its own memory. The only way to communicate is by sending messages. No shared memory, no locks, no races.This is the actor model. Carl Hewitt proposed it in 1973. Erlang implemented it as a runtime in 1986. Every few years, the rest of the industry rediscovers it.What a “process” means on BEAMWhen BEAM programmers say “process,” they don’t mean an operating system process. OS processes are heavy — megabytes of memory, expensive to create, expensive to context-switch. They don’t mean threads either, which share memory and need synchronization. And they don’t mean green threads or coroutines, which are lighter but still typically share a heap and lack true isolation.A BEAM process is something different:~2KB of memory at creation. You can spawn millions of them on a single machine.Own heap, own stack, own garbage collector. When a process is collected, nothing else pauses. No stop-the-world GC events affecting the entire system.Preemptively scheduled. The BEAM scheduler gives each process a budget of approximately 4,000 “reductions” (roughly, function calls) before switching to the next one. No process can hog the CPU. This happens at the VM level — you can’t opt out of it.Completely isolated. A process cannot access another process’s memory. Period. The only way to interact is by sending a message.This last point is the one that changes how you think about software. In most languages, when something goes wrong in one part of your program, the blast radius is unpredictable. A null pointer in a thread can corrupt shared state that other threads depend on. An unhandled exception in a Node.js async handler can crash the entire process — every connection, every user, everything.On BEAM, the blast radius of a failure is exactly one process. Always.# Spawn a process that will crash spawn(fn -> # This process does some work... raise "something went wrong" # This process dies. Nothing else is affected. end) # This code continues running, unaware and unharmed IO.puts("Still here.")This isn’t a try/catch hiding the error. The process that crashed is gone — its memory is reclaimed, its state is released. Everything else keeps running. The question is: who notices, and what happens next?Message passing and mailboxesIf processes can’t share memory, how do they communicate?Every BEAM process has a mailbox — a queue of incoming messages. You send a message to a process using its process identifier (PID). The message is copied into the recipient’s mailbox. The sender doesn’t wait (it’s asynchronous by default). The recipient processes messages from its mailbox when it’s ready.# Process A sends a message to Process B send(process_b_pid, {:temperature_reading, 23.5, ~U[2026-02-22 10:00:00Z]}) # Process B receives it when ready receive do {:temperature_reading, temp, timestamp} -> IO.puts("Got #{temp}°C at #{timestamp}") endA few things to notice:Messages are copied, not shared. When you send a message, the data is copied into the recipient’s heap. This sounds expensive, and for very large messages, it can be. But it means there’s zero possibility of two processes modifying the same data. The tradeoff is worth it — you buy correctness by default.Pattern matching on receive. The receive block uses Elixir’s pattern matching to selectively pull messages from the mailbox. Messages that don’t match stay in the mailbox for later. This means a process can handle different message types in different contexts without any routing logic.Backpressure is built in. If a process receives messages faster than it can handle them, the mailbox grows. This is visible and monitorable. You can inspect any process’s mailbox length, set up alerts, and make architectural decisions about it. Contrast this with thread-based systems where overload manifests as increasing latency, deadlocks, or OOM crashes — symptoms that are harder to diagnose and attribute.The message-passing model creates a natural architecture. Each process is a self-contained unit with its own state, handling one thing well. Processes compose into systems through messages — like microservices, but within a single runtime, with nanosecond message delivery instead of network hops.”Let it crash” — resilience as architectureThis is the most misunderstood concept in the BEAM ecosystem.“Let it crash” does not mean “ignore errors.” It does not mean “don’t handle edge cases.” It means: separate the code that does work from the code that handles failure.In most languages, business logic and error recovery are interleaved:def process_payment(order): try: customer = fetch_customer(order.customer_id) except DatabaseError: logger.error("DB failed fetching customer") return retry_later(order) except CustomerNotFound: logger.error("Customer missing") return mark_order_failed(order) try: charge = payment_gateway.charge(customer, order.total) except PaymentDeclined: notify_customer(customer, "Payment declined") return mark_order_failed(order) except GatewayTimeout: logger.error("Payment gateway timeout") return retry_later(order) except RateLimitError: sleep(1) return process_payment(order) # retry try: send_confirmation(customer, charge) except EmailError: logger.warning("Confirmation email failed") # Continue anyway? Or fail? Hard to decide here. return mark_order_complete(order)Every function call is wrapped in error handling. The happy path — the actual business logic — is buried under defensive code. And every new failure mode adds another branch. The code becomes harder to read, harder to test, and harder to change.On BEAM, you write the happy path:defmodule PaymentProcessor do use GenServer def handle_call({:process, order}, _from, state) do customer = Customers.fetch!(order.customer_id) charge = PaymentGateway.charge!(customer, order.total) Notifications.send_confirmation!(customer, charge) {:reply, :ok, state} end endIf any of those calls fail, the process crashes. That’s not a bug — it’s the design. A supervisor (which we’ll get to next) is watching this process. It knows what to do when it crashes: restart it, retry the operation, or escalate to a higher-level supervisor.The business logic is clean because error recovery is a separate concern, handled by a separate process. This isn’t about being reckless. It’s about putting recovery logic where it belongs — in the supervision tree, not tangled into every function.Here’s the key insight: the process that crashes loses its state, but that’s fine because you designed for it. You put critical state in a database or an ETS table. The process itself is cheap, stateless enough to restart cleanly, and focused entirely on doing its job.Supervision treesA supervisor is a process whose only job is watching other processes and reacting when they die. Supervisors are organized into trees — a supervisor can supervise other supervisors, creating a hierarchy of recovery strategies.defmodule MyApp.Supervisor do use Supervisor def start_link(opts) do Supervisor.start_link(__MODULE__, opts, name: __MODULE__) end def init(_opts) do children = [ {PaymentProcessor, []}, {NotificationService, []}, {MetricsCollector, []} ] # If any child crashes, restart only that child Super