High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 11“Scalable systems, HLD interviews & case studies”

Load Balancers

A load balancer (LB) distributes incoming traffic across multiple backend servers to improve capacity, fault tolerance, and maintenance flexibility.

Course progress0%

Focus

10 guided sections

Practice signal

Examples included

Career prep

Interview Q&A included

Introduction

A load balancer (LB) distributes incoming traffic across multiple backend servers to improve capacity, fault tolerance, and maintenance flexibility. Clients connect to the LB's virtual IP; the LB forwards requests using algorithms like round-robin, least connections, or consistent hashing.

L4 load balancers (TCP) route by IP/port; L7 load balancers (HTTP) route by URL path, headers, and cookies. Cloud providers offer managed LBs (AWS ALB/NLB, GCP LB) that integrate with health checks and auto-scaling groups.

This lesson covers LB placement in HLD, algorithm selection, health checks, SSL termination, and sticky session trade-offs.

Understanding the topic

Key concepts

L4 (transport): fast, protocol-agnostic, good for databases proxies and gaming UDP.
L7 (application): path-based routing, TLS termination, WAF integration.
Algorithms: round-robin, weighted, least connections, IP hash, consistent hash (cache locality).
Health checks: HTTP /health, TCP connect, custom — unhealthy nodes drained.
SSL termination at LB reduces CPU on app servers; re-encrypt to backends optional.
Active-active vs active-passive pairs for LB itself (DNS failover, anycast).

text

flowchart LR
  Client --> LB[Load Balancer]
  LB --> S1[Server 1]
  LB --> S2[Server 2]
  LB --> S3[Server 3]

Internal architecture

Architecture overview

text

flowchart LR
  Client --> LB[Load Balancer]
  LB --> S1[Server 1]
  LB --> S2[Server 2]
  LB --> S3[Server 3]

Step-by-step explanation

Internet → DNS → Cloud LB (TLS terminate) → target group of N app instances.
LB health check every 10s on /ready; remove instance on 3 failures.
Auto-scaling group registers new instances with target group automatically.
Separate internal LB for service-to-service traffic in private VPC.
WebSocket: L7 LB with connection stickiness or dedicated gateway tier.
Global: GeoDNS or anycast LB routes to nearest region; health-based failover.

Informative example

Spring Boot actuator health for LB probes — separate liveness from readiness:

yaml

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus
  endpoint:
    health:
      probes:
        enabled: true
      group:
        liveness:
          include: livenessState
        readiness:
          include: readinessState,db,redis

spring:
  datasource:
    url: jdbc:postgresql://db.internal:5432/shop
  data:
    redis:
      host: redis.internal

# AWS ALB target group health check path: /actuator/health/readiness

Readiness excludes dependency failure from liveness restart loop. LB should use readiness path only.

Real-world use

Real-world use cases

E-commerce web tier during holiday traffic — horizontal pods + ALB.
Banking API gateway cluster with weighted routing for canary releases.
OTT API: L7 path routing /api vs /static to different target groups.
Multi-region failover: DNS LB health checks shift traffic to secondary region.

Best practices

Terminate TLS at LB with modern cipher policies; rotate certs via ACM/Let's Encrypt.
Use least connections for long-lived requests; round-robin for uniform short API calls.
Enable connection draining during deploys (deregistration delay).
Avoid sticky sessions unless required — prefer shared Redis session store.
Monitor LB 5xx rates and target health as primary alerts.
Place rate limiting at edge (API gateway) in addition to LB.

Common mistakes

Health check hits wrong port or auth-protected path — all nodes marked unhealthy.
Sticky sessions causing uneven load during scale events.
No LB in front of single instance — zero fault tolerance.
Ignoring cross-AZ LB data transfer costs at scale.
WebSocket timeouts misconfigured on L7 LB idle timeout.

Advanced interview questions

Q1BeginnerWhat does a load balancer do?

Distributes client requests across multiple servers and removes unhealthy backends from rotation.

Q2BeginnerL4 vs L7 load balancer?

L4 routes TCP/UDP by IP/port; L7 routes HTTP by path, headers, and content.

Q3IntermediateWhen use consistent hashing?

When cache locality or session affinity to specific shard/node matters — minimizes remapping on node add/remove.

Q4IntermediateHow load balancers interact with auto-scaling?

New scaled instances register to target group; LB starts routing after health check passes.

Q5AdvancedDesign LB strategy for global chat API.

Anycast/geo LB to nearest region, L7 path /ws to WebSocket gateway pool, least connections, health on readiness, internal LB to chat services, failover DNS TTL 60s.

Summary

Load balancers enable horizontal scaling and high availability. L7 LB adds path routing, TLS termination, and HTTP-aware policies. Health checks must reflect real readiness, not just process up. Prefer stateless backends over sticky sessions. Global LB pairs with multi-region DR designs. Reverse proxy overlaps LB — clarified next lesson.

Ready to mark this lesson complete?Track your journey across the entire course.