Circuit Breaker
The circuit breaker pattern stops calling a failing downstream service after errors exceed a threshold — failing fast instead of hanging threads and cascading outages.
Introduction
The circuit breaker pattern stops calling a failing downstream service after errors exceed a threshold — failing fast instead of hanging threads and cascading outages. States: Closed (normal), Open (reject calls), Half-Open (trial probe). Resilience4j and Istio implement this in Java and service mesh layers.
HLD pairs circuit breakers with timeouts, bulkheads, and fallbacks. When payment service is down, return graceful degradation (cached quote, retry later) rather than stalling checkout.
This lesson covers configuration, observability, and interview placement in microservice diagrams.
Understanding the topic
Key concepts
- Closed: requests pass; failures counted toward threshold.
- Open: immediate failure or fallback; periodic half-open probe.
- Half-open: limited trial requests; success closes, failure reopens.
- Sliding window failure rate vs consecutive failure count.
- Bulkhead: isolate thread pools per dependency — payment slow doesn't exhaust catalog pool.
- Fallback: static response, cache, queued async — not silent wrong data for money.
stateDiagram-v2[*] --> ClosedClosed --> Open : failures exceed thresholdOpen --> HalfOpen : timeoutHalfOpen --> Closed : successHalfOpen --> Open : failure
Internal architecture
Architecture overview
stateDiagram-v2[*] --> ClosedClosed --> Open : failures exceed thresholdOpen --> HalfOpen : timeoutHalfOpen --> Closed : successHalfOpen --> Open : failure
Step-by-step explanation
- API Gateway or service client wraps downstream HTTP/gRPC with breaker.
- Configure failureRateThreshold 50%, waitDurationInOpenState 30s, permittedCallsInHalfOpenState 5.
- Fallback returns 503 with Retry-After or cached product list without prices.
- Metrics: breaker state, slow call rate, rejected calls → Grafana alerts.
- Combine with timeout (2s) shorter than client patience (5s).
- Test chaos: kill dependency, verify breaker opens and recovers.
Informative example
Resilience4j circuit breaker on payment client with fallback in Spring Boot 3:
@Servicepublic class PaymentClient {private final RestClient rest;private final CircuitBreaker breaker;public PaymentClient(RestClient.Builder builder, CircuitBreakerRegistry registry) {this.rest = builder.baseUrl("http://payment-service").build();this.breaker = registry.circuitBreaker("payment");}public PaymentResult charge(ChargeRequest req) {Supplier<PaymentResult> supplier = CircuitBreaker.decorateSupplier(breaker, () ->rest.post().uri("/charges").body(req).retrieve().body(PaymentResult.class));return Try.ofSupplier(supplier).recover(CallNotPermittedException.class, e -> PaymentResult.deferred(req.orderId())).get();}}// application.yml// resilience4j.circuitbreaker.instances.payment.failure-rate-threshold: 50// resilience4j.circuitbreaker.instances.payment.wait-duration-in-open-state: 30s
Never fallback to fake 'paid' for real money — defer or queue for retry with user messaging.
Real-world use
Real-world use cases
- E-commerce checkout when payment provider latency spikes.
- Social feed mixing optional recommendation service — breaker skips enrichments.
- Banking FX rate fetch from external API with cached fallback rates.
- Maps ETA service degradation in ride-hailing during outage.
Best practices
- Set timeouts on every outbound call before breaker.
- Tune thresholds from production error budgets, not defaults.
- Expose breaker metrics in dashboards.
- Half-open probes prevent thundering retry on recovery.
- Document fallback behavior in API contracts.
- Combine with rate limiting on caller during upstream outage.
Common mistakes
- Breaker without timeout — threads still hang until breaker trips late.
- Fallback returning incorrect financial state.
- Shared breaker for unrelated dependencies — false opens.
- Never testing half-open recovery path.
- Opening breaker but no alert — silent degraded UX.
Advanced interview questions
Q1BeginnerWhat does a circuit breaker do?
Q2BeginnerName circuit breaker states.
Q3IntermediateCircuit breaker vs retry?
Q4IntermediateWhat is bulkhead pattern?
Q5AdvancedDesign resilience for recommendation service in product page.
Summary
Circuit breakers fail fast when dependencies are unhealthy. States: Closed → Open → Half-Open recovery probing. Pair with timeouts, bulkheads, and thoughtful fallbacks. Resilience4j and service mesh implement breakers at different layers. Never fake success on financial fallbacks. Retry pattern handles transient failures before breaker opens.