Distributed Transactions
A distributed transaction spans multiple services or databases with atomicity guarantees — all succeed or all roll back.
Introduction
A distributed transaction spans multiple services or databases with atomicity guarantees — all succeed or all roll back. Classic 2PC (two-phase commit) across microservices is slow and brittle; modern HLD favors sagas: sequences of local transactions with compensating actions.
Interviewers ask about order + payment + inventory frequently. Show saga choreography, idempotent steps, and outbox — not XA transactions across HTTP services.
This lesson compares 2PC, saga orchestration vs choreography, and TCC (try-confirm-cancel) patterns.
Understanding the topic
Key concepts
- 2PC: coordinator prepare then commit — blocking, not suited for long HTTP chains.
- Saga: split into steps; failure triggers compensating transactions (cancel reservation).
- Orchestration: central saga manager directs steps (workflow engine).
- Choreography: services react to events without central coordinator.
- TCC: Try reserve → Confirm capture → Cancel release holds.
- At-least-once saga steps require idempotency keys per step.
sequenceDiagramOrder->>Inventory: reserveOrder->>Payment: chargeOrder->>Inventory: confirm or rollback
Internal architecture
Architecture overview
sequenceDiagramOrder->>Inventory: reserveOrder->>Payment: chargeOrder->>Inventory: confirm or rollback
Step-by-step explanation
- Order service creates PENDING order → publishes OrderCreated.
- Inventory service reserves → InventoryReserved or InventoryFailed event.
- Payment service captures on reserved → PaymentCaptured or PaymentFailed.
- On PaymentFailed: Inventory compensates release; Order marks CANCELLED.
- Saga state in order_saga table or Temporal/Camunda workflow instance.
- Outbox ensures event publish atomic with local DB commit.
Informative example
Saga orchestration sketch with Spring and Kafka events — compensating release on payment failure:
@Servicepublic class OrderSagaOrchestrator {private final OrderRepository orders;private final KafkaTemplate<String, Object> kafka;public OrderSagaOrchestrator(OrderRepository orders, KafkaTemplate<String, Object> kafka) {this.orders = orders;this.kafka = kafka;}public void start(CreateOrderCommand cmd) {Order order = orders.save(Order.pending(cmd));kafka.send("saga.inventory.reserve", order.id(), new ReserveInventory(order.id(), cmd.items()));}@KafkaListener(topics = "saga.payment.failed", groupId = "order-saga")public void onPaymentFailed(PaymentFailedEvent e) {orders.findById(e.orderId()).ifPresent(order -> {kafka.send("saga.inventory.release", order.id(), new ReleaseInventory(order.id()));order.cancel();orders.save(order);});}}
Each handler idempotent by orderId. Prefer Temporal for long-running sagas with visibility.
Real-world use
Real-world use cases
- E-commerce checkout: order, payment, warehouse, loyalty points.
- Travel booking: flight + hotel + car saga with compensations.
- Banking transfer between internal accounts and external ACH.
- Food delivery: restaurant confirm + driver assign + payment hold.
Best practices
- Design compensating actions for every forward step.
- Persist saga state for recovery after crash mid-flow.
- Timeout stuck sagas and trigger compensation automatically.
- Use idempotency keys on every step endpoint.
- Monitor saga completion rate and stuck instances.
- Keep saga steps async via events to reduce coupling.
Common mistakes
- Distributed 2PC across microservices over HTTP — locks and fragility.
- No compensation for inventory hold — ghost reservations.
- Duplicate event processing double-charging without idempotency.
- Saga without visibility — ops can't debug stuck orders.
- Long synchronous chain blocking user request thread.
Advanced interview questions
Q1BeginnerWhy avoid 2PC in microservices?
Q2BeginnerWhat is a saga?
Q3IntermediateOrchestration vs choreography saga?
Q4IntermediateWhat is a compensating transaction?
Q5AdvancedDesign checkout saga for flash sale.
Summary
Distributed transactions across services use sagas, not 2PC. Compensating actions undo or semantically reverse forward steps. Orchestration vs choreography is a coupling vs visibility trade-off. Idempotency and outbox are mandatory for reliable sagas. Workflow engines help long-running and observable sagas. Circuit breakers protect saga participants from cascade failures.