Why Your Load Balancer Still Sends Traffic to Dead Backends

Hacker News

Published about 8 hours ago

Why Your Load Balancer Still Sends Traffic to Dead Backends

Hacker News · Feb 23, 2026 · Collected from RSS

Summary

Article URL: https://singh-sanjay.com/2026/01/12/health-checks-client-vs-server-side-lb.html Comments URL: https://news.ycombinator.com/item?id=47130431 Points: 8 # Comments: 2

Full Article

A service reports healthy. The load balancer believes it. A request lands on it and times out. Another follows. Then ten more. By the time the system reacts, hundreds of requests have drained into a broken instance while users stared at a spinner. Health checking sounds simple: ask if something is alive, stop sending traffic if it isn’t. In practice, the mechanism behind that check, and who performs it, determines how fast your system detects failure, how accurately it responds, and how much of that complexity leaks into your application code. The answer is fundamentally different depending on where load balancing lives: in a central proxy, or in the client itself. Two Models for Distributing Traffic Before getting into health checks, it helps to be precise about what each model looks like. Server-Side Load Balancing A dedicated proxy sits between clients and the backend fleet. Clients know one address: the load balancer. The load balancer knows the backend pool and decides where each request goes. The load balancer is the single point of intelligence. It tracks backend health, maintains connection pools, and routes traffic. Clients are completely unaware of the backend topology; they see one stable address regardless of how many instances are behind it, or how many fail. HAProxy, NGINX, AWS ALB, and most hardware appliances follow this model. Client-Side Load Balancing The routing intelligence moves into the client. Each client holds a local view of the available backend instances, typically populated from a service registry, and makes its own routing decision on every request. There is no proxy in the request path. A service registry keeps the authoritative list of instances. Clients subscribe to updates and maintain their own routing table. gRPC’s built-in load balancing, Netflix Ribbon, and LinkedIn’s D2 all work this way. The registry often exposes instance addresses through DNS — which introduces its own propagation delays and failure modes, covered in It’s Always DNS. Health Checking: Who Asks, and How The two models produce fundamentally different answers to the same question: is this instance healthy? Health Checking in Server-Side Load Balancing The load balancer owns health checking entirely. It runs periodic probes against each backend, typically a TCP connect, an HTTP request to a /health endpoint, or a custom command, on a fixed schedule. A typical configuration might look like: Interval: probe every 5 seconds Timeout: wait up to 2 seconds for a response Rise threshold: 2 consecutive successes to mark healthy Fall threshold: 3 consecutive failures to mark unhealthy These thresholds exist to avoid flapping: toggling an instance in and out of rotation on a single transient failure. The downside is latency. With a 5-second interval and a fall threshold of 3, a hard failure takes up to 15 seconds to detect. During that window, real traffic continues to hit the broken instance. Once the load balancer marks an instance unhealthy, it removes it from the rotation immediately. No client needs to be updated; the change is in one place, takes effect instantly, and is consistent for all callers. Health Checking in Client-Side Load Balancing With no central proxy, health checking is distributed. Each client must independently determine which instances in its local list are safe to use. There are two approaches, and most production systems use both. Active health checks: the client (or a sidecar process) periodically probes each known instance, just like a server-side load balancer would. The difference is that every client runs its own probe loop. With 500 clients each checking 20 instances every 5 seconds, that is 2,000 probe requests per second hitting your fleet, just for health signals, before any real traffic. Each client forms its own independent view. Two clients probing the same instance at different moments can reach different conclusions, especially during the brief window when an instance is degrading. The fleet’s health state is eventually consistent rather than authoritative. Passive health checks (also called outlier detection or failure tracking) take a different approach: instead of probing, the client watches the outcomes of real requests. A connection refused, a timeout, a stream of 500s. These are signals that something is wrong with that instance. The client marks it unhealthy locally and stops routing to it for a backoff period. Passive checking has a meaningful advantage: failure detection is immediate. The first failed request triggers the response; there is no polling interval to wait through. The cost is that at least one real request must fail before the client reacts. In high-throughput systems this is usually acceptable; in low-traffic or bursty scenarios it can mean more user-visible errors. What Each Model Gets Right Server-side load balancing gives you a single, consistent view of fleet health. Every client gets the same routing decisions without knowing anything about the backend topology. This is operationally simple: health check configuration lives in one place, changes take effect instantly across all callers, and the backend is completely decoupled from the routing logic. At modest scale, a few dozen services and hundreds of clients, this is almost always the right default. Client-side load balancing trades that simplicity for scale. When you have thousands of services talking to each other at high call rates, a central proxy becomes a bottleneck and a single point of failure. Removing it from the request path reduces latency and eliminates a class of infrastructure failure. Passive health checking gives clients sub-request-latency failure detection that a polling-based central proxy simply cannot match. The cost is real: distributed health state is harder to reason about. Two clients can disagree on whether an instance is healthy. Debugging a routing anomaly requires looking at state spread across hundreds of processes rather than one. And the health check logic itself (thresholds, backoff, jitter) needs to live in every client library, tested and maintained across every language your organization uses. Choosing Between Them There is no universal answer. The right model depends on your fleet size, call rates, operational maturity, and how much complexity you can manage in client libraries. Server-side load balancing is simpler to operate and reason about. For most teams and most services, it is the right starting point. Client-side load balancing pays off when scale makes a central proxy genuinely painful: when the proxy itself becomes a bottleneck, when you need sub-millisecond failure detection, or when the overhead of a proxy hop is measurable and matters. Many large systems end up using both: server-side load balancing at the ingress layer where clients are external and uncontrollable, and client-side load balancing for internal service-to-service calls where the client library can be standardized. The health checking story in each layer is different, the failure modes are different, and understanding both is what lets you reason clearly about where traffic actually goes when things go wrong.

Share this story

Read Original at Hacker News

Hacker Newsabout 2 hours ago

Show HN: enveil – hide your .env secrets from prAIng eyes

Article URL: https://github.com/GreatScott/enveil Comments URL: https://news.ycombinator.com/item?id=47133055 Points: 4 # Comments: 1

Hacker Newsabout 3 hours ago

Intel XeSS 3: expanded support for Core Ultra/Core Ultra 2 and Arc A, B series

Article URL: https://www.intel.com/content/www/us/en/download/785597/intel-arc-graphics-windows.html Comments URL: https://news.ycombinator.com/item?id=47132721 Points: 9 # Comments: 0

Hacker Newsabout 3 hours ago

The Weird OS Built Around a Database [video]

Article URL: https://www.youtube.com/watch?v=pWZBQMRmW7k Comments URL: https://news.ycombinator.com/item?id=47132650 Points: 3 # Comments: 0

Hacker Newsabout 4 hours ago

Blood test boosts Alzheimer's diagnosis accuracy to 94.5%, clinical study shows

Article URL: https://medicalxpress.com/news/2026-02-blood-boosts-alzheimer-diagnosis-accuracy.html Comments URL: https://news.ycombinator.com/item?id=47132388 Points: 41 # Comments: 9

Hacker Newsabout 5 hours ago

Show HN: X86CSS – An x86 CPU emulator written in CSS

Article URL: https://lyra.horse/x86css/ Comments URL: https://news.ycombinator.com/item?id=47132102 Points: 7 # Comments: 0

Hacker Newsabout 5 hours ago

NIST Seeking Public Comment on AI Agent Security (Deadline: March 9, 2026)

Article URL: https://www.federalregister.gov/documents/2026/01/08/2026-00206/request-for-information-regarding-security-considerations-for-artificial-intelligence-agents Comments URL: https://news.ycombinator.com/item?id=47131689 Points: 26 # Comments: 7

All Articles

Hacker News

Published about 8 hours ago

Why Your Load Balancer Still Sends Traffic to Dead Backends

Hacker News · Feb 23, 2026 · Collected from RSS

Summary

Article URL: https://singh-sanjay.com/2026/01/12/health-checks-client-vs-server-side-lb.html Comments URL: https://news.ycombinator.com/item?id=47130431 Points: 8 # Comments: 2

Full Article

Share this story

Read Original at Hacker News

Hacker Newsabout 2 hours ago

Show HN: enveil – hide your .env secrets from prAIng eyes

Article URL: https://github.com/GreatScott/enveil Comments URL: https://news.ycombinator.com/item?id=47133055 Points: 4 # Comments: 1

Hacker Newsabout 3 hours ago

Intel XeSS 3: expanded support for Core Ultra/Core Ultra 2 and Arc A, B series

Article URL: https://www.intel.com/content/www/us/en/download/785597/intel-arc-graphics-windows.html Comments URL: https://news.ycombinator.com/item?id=47132721 Points: 9 # Comments: 0

Hacker Newsabout 3 hours ago

The Weird OS Built Around a Database [video]

Article URL: https://www.youtube.com/watch?v=pWZBQMRmW7k Comments URL: https://news.ycombinator.com/item?id=47132650 Points: 3 # Comments: 0

Hacker Newsabout 4 hours ago

Blood test boosts Alzheimer's diagnosis accuracy to 94.5%, clinical study shows

Article URL: https://medicalxpress.com/news/2026-02-blood-boosts-alzheimer-diagnosis-accuracy.html Comments URL: https://news.ycombinator.com/item?id=47132388 Points: 41 # Comments: 9

Hacker Newsabout 5 hours ago

Show HN: X86CSS – An x86 CPU emulator written in CSS

Article URL: https://lyra.horse/x86css/ Comments URL: https://news.ycombinator.com/item?id=47132102 Points: 7 # Comments: 0

Hacker Newsabout 5 hours ago

Why Your Load Balancer Still Sends Traffic to Dead Backends

Full Article

Related Articles

Show HN: enveil – hide your .env secrets from prAIng eyes

Intel XeSS 3: expanded support for Core Ultra/Core Ultra 2 and Arc A, B series

The Weird OS Built Around a Database [video]

Blood test boosts Alzheimer's diagnosis accuracy to 94.5%, clinical study shows

Show HN: X86CSS – An x86 CPU emulator written in CSS

NIST Seeking Public Comment on AI Agent Security (Deadline: March 9, 2026)

Why Your Load Balancer Still Sends Traffic to Dead Backends

Full Article

Related Articles

Show HN: enveil – hide your .env secrets from prAIng eyes

Intel XeSS 3: expanded support for Core Ultra/Core Ultra 2 and Arc A, B series

The Weird OS Built Around a Database [video]

Blood test boosts Alzheimer's diagnosis accuracy to 94.5%, clinical study shows

Show HN: X86CSS – An x86 CPU emulator written in CSS

NIST Seeking Public Comment on AI Agent Security (Deadline: March 9, 2026)