High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 26

    Distributed Transactions

    A distributed transaction spans multiple services or databases with atomicity guarantees — all succeed or all roll back.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    A distributed transaction spans multiple services or databases with atomicity guarantees — all succeed or all roll back. Classic 2PC (two-phase commit) across microservices is slow and brittle; modern HLD favors sagas: sequences of local transactions with compensating actions.

    Interviewers ask about order + payment + inventory frequently. Show saga choreography, idempotent steps, and outbox — not XA transactions across HTTP services.

    This lesson compares 2PC, saga orchestration vs choreography, and TCC (try-confirm-cancel) patterns.

    Understanding the topic

    Key concepts

    • 2PC: coordinator prepare then commit — blocking, not suited for long HTTP chains.
    • Saga: split into steps; failure triggers compensating transactions (cancel reservation).
    • Orchestration: central saga manager directs steps (workflow engine).
    • Choreography: services react to events without central coordinator.
    • TCC: Try reserve → Confirm capture → Cancel release holds.
    • At-least-once saga steps require idempotency keys per step.
    text
    sequenceDiagram
    Order->>Inventory: reserve
    Order->>Payment: charge
    Order->>Inventory: confirm or rollback

    Internal architecture

    Architecture overview

    text
    sequenceDiagram
    Order->>Inventory: reserve
    Order->>Payment: charge
    Order->>Inventory: confirm or rollback

    Step-by-step explanation

    1. Order service creates PENDING order → publishes OrderCreated.
    2. Inventory service reserves → InventoryReserved or InventoryFailed event.
    3. Payment service captures on reserved → PaymentCaptured or PaymentFailed.
    4. On PaymentFailed: Inventory compensates release; Order marks CANCELLED.
    5. Saga state in order_saga table or Temporal/Camunda workflow instance.
    6. Outbox ensures event publish atomic with local DB commit.

    Informative example

    Saga orchestration sketch with Spring and Kafka events — compensating release on payment failure:

    java
    @Service
    public class OrderSagaOrchestrator {
    private final OrderRepository orders;
    private final KafkaTemplate<String, Object> kafka;
    public OrderSagaOrchestrator(OrderRepository orders, KafkaTemplate<String, Object> kafka) {
    this.orders = orders;
    this.kafka = kafka;
    }
    public void start(CreateOrderCommand cmd) {
    Order order = orders.save(Order.pending(cmd));
    kafka.send("saga.inventory.reserve", order.id(), new ReserveInventory(order.id(), cmd.items()));
    }
    @KafkaListener(topics = "saga.payment.failed", groupId = "order-saga")
    public void onPaymentFailed(PaymentFailedEvent e) {
    orders.findById(e.orderId()).ifPresent(order -> {
    kafka.send("saga.inventory.release", order.id(), new ReleaseInventory(order.id()));
    order.cancel();
    orders.save(order);
    });
    }
    }

    Each handler idempotent by orderId. Prefer Temporal for long-running sagas with visibility.

    Real-world use

    Real-world use cases

    • E-commerce checkout: order, payment, warehouse, loyalty points.
    • Travel booking: flight + hotel + car saga with compensations.
    • Banking transfer between internal accounts and external ACH.
    • Food delivery: restaurant confirm + driver assign + payment hold.

    Best practices

    • Design compensating actions for every forward step.
    • Persist saga state for recovery after crash mid-flow.
    • Timeout stuck sagas and trigger compensation automatically.
    • Use idempotency keys on every step endpoint.
    • Monitor saga completion rate and stuck instances.
    • Keep saga steps async via events to reduce coupling.

    Common mistakes

    • Distributed 2PC across microservices over HTTP — locks and fragility.
    • No compensation for inventory hold — ghost reservations.
    • Duplicate event processing double-charging without idempotency.
    • Saga without visibility — ops can't debug stuck orders.
    • Long synchronous chain blocking user request thread.

    Advanced interview questions

    Q1BeginnerWhy avoid 2PC in microservices?
    Blocking locks, coordinator SPOF, poor fit for long-running or cross-HTTP operations.
    Q2BeginnerWhat is a saga?
    Sequence of local transactions with compensating actions if a step fails.
    Q3IntermediateOrchestration vs choreography saga?
    Orchestrator central workflow vs services reacting to events decentrally.
    Q4IntermediateWhat is a compensating transaction?
    Semantic undo — e.g., release inventory hold, refund payment — not always literal DELETE.
    Q5AdvancedDesign checkout saga for flash sale.
    Order PENDING, async reserve inventory with TTL, payment authorize, confirm or release, idempotency keys, DLQ, monitor stuck >5min, rate limit saga starts.

    Summary

    Distributed transactions across services use sagas, not 2PC. Compensating actions undo or semantically reverse forward steps. Orchestration vs choreography is a coupling vs visibility trade-off. Idempotency and outbox are mandatory for reliable sagas. Workflow engines help long-running and observable sagas. Circuit breakers protect saga participants from cascade failures.

    Ready to mark this lesson complete?Track your journey across the entire course.