High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 11

    Load Balancers

    A load balancer (LB) distributes incoming traffic across multiple backend servers to improve capacity, fault tolerance, and maintenance flexibility.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    A load balancer (LB) distributes incoming traffic across multiple backend servers to improve capacity, fault tolerance, and maintenance flexibility. Clients connect to the LB's virtual IP; the LB forwards requests using algorithms like round-robin, least connections, or consistent hashing.

    L4 load balancers (TCP) route by IP/port; L7 load balancers (HTTP) route by URL path, headers, and cookies. Cloud providers offer managed LBs (AWS ALB/NLB, GCP LB) that integrate with health checks and auto-scaling groups.

    This lesson covers LB placement in HLD, algorithm selection, health checks, SSL termination, and sticky session trade-offs.

    Understanding the topic

    Key concepts

    • L4 (transport): fast, protocol-agnostic, good for databases proxies and gaming UDP.
    • L7 (application): path-based routing, TLS termination, WAF integration.
    • Algorithms: round-robin, weighted, least connections, IP hash, consistent hash (cache locality).
    • Health checks: HTTP /health, TCP connect, custom — unhealthy nodes drained.
    • SSL termination at LB reduces CPU on app servers; re-encrypt to backends optional.
    • Active-active vs active-passive pairs for LB itself (DNS failover, anycast).
    text
    flowchart LR
    Client --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]

    Internal architecture

    Architecture overview

    text
    flowchart LR
    Client --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]

    Step-by-step explanation

    1. Internet → DNS → Cloud LB (TLS terminate) → target group of N app instances.
    2. LB health check every 10s on /ready; remove instance on 3 failures.
    3. Auto-scaling group registers new instances with target group automatically.
    4. Separate internal LB for service-to-service traffic in private VPC.
    5. WebSocket: L7 LB with connection stickiness or dedicated gateway tier.
    6. Global: GeoDNS or anycast LB routes to nearest region; health-based failover.

    Informative example

    Spring Boot actuator health for LB probes — separate liveness from readiness:

    yaml
    management:
    endpoints:
    web:
    exposure:
    include: health,info,prometheus
    endpoint:
    health:
    probes:
    enabled: true
    group:
    liveness:
    include: livenessState
    readiness:
    include: readinessState,db,redis
    spring:
    datasource:
    url: jdbc:postgresql://db.internal:5432/shop
    data:
    redis:
    host: redis.internal
    # AWS ALB target group health check path: /actuator/health/readiness

    Readiness excludes dependency failure from liveness restart loop. LB should use readiness path only.

    Real-world use

    Real-world use cases

    • E-commerce web tier during holiday traffic — horizontal pods + ALB.
    • Banking API gateway cluster with weighted routing for canary releases.
    • OTT API: L7 path routing /api vs /static to different target groups.
    • Multi-region failover: DNS LB health checks shift traffic to secondary region.

    Best practices

    • Terminate TLS at LB with modern cipher policies; rotate certs via ACM/Let's Encrypt.
    • Use least connections for long-lived requests; round-robin for uniform short API calls.
    • Enable connection draining during deploys (deregistration delay).
    • Avoid sticky sessions unless required — prefer shared Redis session store.
    • Monitor LB 5xx rates and target health as primary alerts.
    • Place rate limiting at edge (API gateway) in addition to LB.

    Common mistakes

    • Health check hits wrong port or auth-protected path — all nodes marked unhealthy.
    • Sticky sessions causing uneven load during scale events.
    • No LB in front of single instance — zero fault tolerance.
    • Ignoring cross-AZ LB data transfer costs at scale.
    • WebSocket timeouts misconfigured on L7 LB idle timeout.

    Advanced interview questions

    Q1BeginnerWhat does a load balancer do?
    Distributes client requests across multiple servers and removes unhealthy backends from rotation.
    Q2BeginnerL4 vs L7 load balancer?
    L4 routes TCP/UDP by IP/port; L7 routes HTTP by path, headers, and content.
    Q3IntermediateWhen use consistent hashing?
    When cache locality or session affinity to specific shard/node matters — minimizes remapping on node add/remove.
    Q4IntermediateHow load balancers interact with auto-scaling?
    New scaled instances register to target group; LB starts routing after health check passes.
    Q5AdvancedDesign LB strategy for global chat API.
    Anycast/geo LB to nearest region, L7 path /ws to WebSocket gateway pool, least connections, health on readiness, internal LB to chat services, failover DNS TTL 60s.

    Summary

    Load balancers enable horizontal scaling and high availability. L7 LB adds path routing, TLS termination, and HTTP-aware policies. Health checks must reflect real readiness, not just process up. Prefer stateless backends over sticky sessions. Global LB pairs with multi-region DR designs. Reverse proxy overlaps LB — clarified next lesson.

    Ready to mark this lesson complete?Track your journey across the entire course.