High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 27

    Circuit Breaker

    The circuit breaker pattern stops calling a failing downstream service after errors exceed a threshold — failing fast instead of hanging threads and cascading outages.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    The circuit breaker pattern stops calling a failing downstream service after errors exceed a threshold — failing fast instead of hanging threads and cascading outages. States: Closed (normal), Open (reject calls), Half-Open (trial probe). Resilience4j and Istio implement this in Java and service mesh layers.

    HLD pairs circuit breakers with timeouts, bulkheads, and fallbacks. When payment service is down, return graceful degradation (cached quote, retry later) rather than stalling checkout.

    This lesson covers configuration, observability, and interview placement in microservice diagrams.

    Understanding the topic

    Key concepts

    • Closed: requests pass; failures counted toward threshold.
    • Open: immediate failure or fallback; periodic half-open probe.
    • Half-open: limited trial requests; success closes, failure reopens.
    • Sliding window failure rate vs consecutive failure count.
    • Bulkhead: isolate thread pools per dependency — payment slow doesn't exhaust catalog pool.
    • Fallback: static response, cache, queued async — not silent wrong data for money.
    text
    stateDiagram-v2
    [*] --> Closed
    Closed --> Open : failures exceed threshold
    Open --> HalfOpen : timeout
    HalfOpen --> Closed : success
    HalfOpen --> Open : failure

    Internal architecture

    Architecture overview

    text
    stateDiagram-v2
    [*] --> Closed
    Closed --> Open : failures exceed threshold
    Open --> HalfOpen : timeout
    HalfOpen --> Closed : success
    HalfOpen --> Open : failure

    Step-by-step explanation

    1. API Gateway or service client wraps downstream HTTP/gRPC with breaker.
    2. Configure failureRateThreshold 50%, waitDurationInOpenState 30s, permittedCallsInHalfOpenState 5.
    3. Fallback returns 503 with Retry-After or cached product list without prices.
    4. Metrics: breaker state, slow call rate, rejected calls → Grafana alerts.
    5. Combine with timeout (2s) shorter than client patience (5s).
    6. Test chaos: kill dependency, verify breaker opens and recovers.

    Informative example

    Resilience4j circuit breaker on payment client with fallback in Spring Boot 3:

    java
    @Service
    public class PaymentClient {
    private final RestClient rest;
    private final CircuitBreaker breaker;
    public PaymentClient(RestClient.Builder builder, CircuitBreakerRegistry registry) {
    this.rest = builder.baseUrl("http://payment-service").build();
    this.breaker = registry.circuitBreaker("payment");
    }
    public PaymentResult charge(ChargeRequest req) {
    Supplier<PaymentResult> supplier = CircuitBreaker.decorateSupplier(breaker, () ->
    rest.post().uri("/charges").body(req).retrieve().body(PaymentResult.class));
    return Try.ofSupplier(supplier)
    .recover(CallNotPermittedException.class, e -> PaymentResult.deferred(req.orderId()))
    .get();
    }
    }
    // application.yml
    // resilience4j.circuitbreaker.instances.payment.failure-rate-threshold: 50
    // resilience4j.circuitbreaker.instances.payment.wait-duration-in-open-state: 30s

    Never fallback to fake 'paid' for real money — defer or queue for retry with user messaging.

    Real-world use

    Real-world use cases

    • E-commerce checkout when payment provider latency spikes.
    • Social feed mixing optional recommendation service — breaker skips enrichments.
    • Banking FX rate fetch from external API with cached fallback rates.
    • Maps ETA service degradation in ride-hailing during outage.

    Best practices

    • Set timeouts on every outbound call before breaker.
    • Tune thresholds from production error budgets, not defaults.
    • Expose breaker metrics in dashboards.
    • Half-open probes prevent thundering retry on recovery.
    • Document fallback behavior in API contracts.
    • Combine with rate limiting on caller during upstream outage.

    Common mistakes

    • Breaker without timeout — threads still hang until breaker trips late.
    • Fallback returning incorrect financial state.
    • Shared breaker for unrelated dependencies — false opens.
    • Never testing half-open recovery path.
    • Opening breaker but no alert — silent degraded UX.

    Advanced interview questions

    Q1BeginnerWhat does a circuit breaker do?
    Stops calls to failing dependency after threshold, failing fast to prevent cascade.
    Q2BeginnerName circuit breaker states.
    Closed, Open, Half-Open.
    Q3IntermediateCircuit breaker vs retry?
    Retry helps transient failures; breaker stops wasting resources on sustained outage — use together with limits.
    Q4IntermediateWhat is bulkhead pattern?
    Isolate resource pools per dependency so one slow service doesn't exhaust all threads.
    Q5AdvancedDesign resilience for recommendation service in product page.
    2s timeout, breaker 40% failures/20s open, fallback omit recommendations, bulkhead 20 threads, cache last good recs 5min, chaos test monthly.

    Summary

    Circuit breakers fail fast when dependencies are unhealthy. States: Closed → Open → Half-Open recovery probing. Pair with timeouts, bulkheads, and thoughtful fallbacks. Resilience4j and service mesh implement breakers at different layers. Never fake success on financial fallbacks. Retry pattern handles transient failures before breaker opens.

    Ready to mark this lesson complete?Track your journey across the entire course.