CAP Theorem
The CAP theorem states that a distributed data store cannot simultaneously provide all three: Consistency (every read sees latest write), Availability (every request gets a resp…
Introduction
The CAP theorem states that a distributed data store cannot simultaneously provide all three: Consistency (every read sees latest write), Availability (every request gets a response), and Partition tolerance (system works despite network splits). During a partition, you choose CP or AP.
CAP is interview shorthand for trade-off thinking — not a literal prohibition. Modern systems offer tunable consistency (Quorum reads in Cassandra, read concern in MongoDB). Partition tolerance is mandatory in distributed systems; the real choice is consistency vs availability during failure.
This lesson applies CAP to database selection, multi-region design, and explaining eventual consistency to interviewers.
Understanding the topic
Key concepts
- Partition: network split isolates nodes — messages lost or delayed between AZs/regions.
- CP: reject requests or return errors to preserve consistency (ZooKeeper, etcd, sync SQL primary).
- AP: accept writes on both sides, resolve conflicts later (Dynamo-style, Cassandra tunable down).
- Eventual consistency: replicas converge if no new writes; window of stale reads.
- PACELC extension: if Partition, choose A or C; Else choose Latency or Consistency.
- Linearizability strongest single-object guarantee — expensive globally.
flowchart TBC[Consistency]A[Availability]P[Partition Tolerance]C --- PA --- P
Internal architecture
Architecture overview
flowchart TBC[Consistency]A[Availability]P[Partition Tolerance]C --- PA --- P
Step-by-step explanation
- Payment ledger: CP — single primary, sync replica quorum, fail rather than double-spend.
- Social like count: AP — async increment, eventual display, high availability prioritized.
- Multi-region active-active: conflict resolution (LWW, vector clocks, CRDTs) for AP choice.
- Read repair and hinted handoff in Cassandra for convergence.
- SLA defines acceptable staleness — drives AP vs CP per feature.
- Health checks and fencing during partition to prevent split-brain writes.
Informative example
Cassandra QUORUM read/write — tunable consistency between ONE and ALL:
@Configurationpublic class CassandraConfig extends AbstractCassandraConfiguration {@Overridepublic String getKeyspaceName() { return "social"; }@Beanpublic CqlSession cqlSession() {return CqlSession.builder().addContactPoint(new InetSocketAddress("cassandra", 9042)).withLocalDatacenter("dc1").withKeyspace(getKeyspaceName()).build();}}@Servicepublic class LikeService {private final CqlSession session;public LikeService(CqlSession session) { this.session = session; }public void like(String postId, String userId) {session.execute("""INSERT INTO likes (post_id, user_id, liked_at)VALUES (?, ?, toTimestamp(now()))USING QUORUM""", postId, userId);}public long count(String postId) {var rs = session.execute("SELECT COUNT(*) FROM likes WHERE post_id = ? USING LOCAL_QUORUM", postId);return rs.one().getLong(0);}}
QUORUM = CP-leaning during normal ops; during partition behavior depends on replica overlap. Match consistency level to business tolerance.
Real-world use
Real-world use cases
- Banking CP on account balances; AP on marketing preference flags.
- E-commerce inventory during partition: reserve conservatively (CP) vs oversell risk (AP).
- OTT view counts AP; DRM license tokens CP short TTL.
- Healthcare critical alerts CP routing; analytics dashboards AP.
Best practices
- Classify features by consistency requirement before picking stores.
- Document staleness SLOs user-visible (e.g., follower count ±30s).
- Use consensus systems (Raft) for small strongly consistent metadata.
- Test partition behavior with chaos engineering (iptables, Toxiproxy).
- Don't cite CAP to justify any inconsistency — be feature-specific.
- Combine AP write path with sync read from primary for hybrid flows.
Common mistakes
- Claiming 'we are CAP compliant' without specifying which letter under partition.
- Using AP store for money without idempotency and reconciliation.
- Ignoring network partitions as rare — they happen during deploys and AZ failures.
- Confusing CAP consistency with ACID isolation levels.
- Single-datacenter design ignoring partition tolerance until multi-region asked.
Advanced interview questions
Q1BeginnerWhat does CAP stand for?
Q2BeginnerWhy is partition tolerance mandatory in distributed systems?
Q3IntermediateCP vs AP example?
Q4IntermediateWhat is PACELC?
Q5AdvancedDesign multi-region product catalog.
Summary
CAP forces consistency vs availability choice during network partitions. Partition tolerance is non-negotiable in distributed HLD. Match CP/AP per feature, not per whole system. PACELC extends trade-offs to latency in normal conditions. Eventual consistency needs product-level staleness acceptance. Redis and caches layer consistency models on top of CAP choices.