Partitioning
Partitioning divides one logical table into smaller physical pieces (partitions) within a single database instance — by range (date), list (region), or hash.
Introduction
Partitioning divides one logical table into smaller physical pieces (partitions) within a single database instance — by range (date), list (region), or hash. Unlike sharding across servers, partitioning stays on one node (or one shard) but improves query pruning, maintenance, and archival.
PostgreSQL declarative partitioning and MySQL PARTITION BY RANGE are common. HLD interviews use partitioning for time-series logs, orders by month, and dropping old data instantly via DROP PARTITION.
This lesson distinguishes partitioning from sharding and when both apply together.
Understanding the topic
Key concepts
- Range partitioning: created_at monthly — queries with date filter scan one partition.
- List partitioning: country_code IN ('US','CA') → partition americas.
- Hash partitioning: even split when no natural range — limited pruning benefit.
- Partition pruning: optimizer skips irrelevant partitions — critical for performance.
- Maintenance: detach/drop old partitions cheaper than DELETE millions of rows.
- Partitioning vs sharding: partition = single DB; shard = multiple DB instances.
flowchart TBTable --> P2024[Partition 2024]Table --> P2025[Partition 2025]Table --> P2026[Partition 2026]
Internal architecture
Architecture overview
flowchart TBTable --> P2024[Partition 2024]Table --> P2025[Partition 2025]Table --> P2026[Partition 2026]
Step-by-step explanation
- Hot recent partition on fast storage; cold partitions on cheaper tier or archived to S3.
- Application queries include partition key in WHERE for pruning.
- Global indexes across partitions may be limited — design PK including partition key.
- Scheduled job creates future partitions before month rollover.
- Retention: DROP PARTITION older than 7 years for compliance.
- At extreme scale: partition within each shard (shard by user, partition by month).
Informative example
PostgreSQL range partitioning for orders by month — SQL DDL and Spring query hint:
-- DDL (run via migration)CREATE TABLE orders (id UUID PRIMARY KEY,user_id UUID NOT NULL,created_at TIMESTAMPTZ NOT NULL,total NUMERIC(12,2)) PARTITION BY RANGE (created_at);CREATE TABLE orders_2026_01 PARTITION OF ordersFOR VALUES FROM ('2026-01-01') TO ('2026-02-01');// Repository query must filter created_at for pruning@Query("""SELECT o FROM Order oWHERE o.userId = :userIdAND o.createdAt >= :from AND o.createdAt < :to""")List<Order> findUserOrdersInRange(String userId, Instant from, Instant to);
Interview: 'Partition orders by month for query speed and cheap retention; shard by userId when single instance fills.'
Real-world use
Real-world use cases
- E-commerce order history queries last 90 days — prune old partitions.
- Banking audit logs: monthly partitions with 7-year retention policy.
- IoT telemetry time-series in PostgreSQL/Timescale before moving cold to S3.
- Healthcare event logs partitioned by date for HIPAA retention deletes.
Best practices
- Always include partition key in queries needing performance.
- Automate partition creation — missing partition breaks inserts.
- Monitor partition sizes — split when one partition too large.
- Test EXPLAIN plans confirm partition pruning.
- Combine with indexes on partition key + filter columns.
- Archive detached partitions to object storage for cost.
Common mistakes
- Queries without partition key — full table scan all partitions.
- Forgetting to pre-create next month's partition.
- Primary key not including partition key — routing issues in some engines.
- Confusing table partitioning with application-level sharding only.
- Too many tiny partitions — metadata overhead.
Advanced interview questions
Q1BeginnerWhat is table partitioning?
Q2BeginnerPartitioning vs sharding?
Q3IntermediateWhy partition by date for logs?
Q4IntermediateWhat is partition pruning?
Q5AdvancedDesign storage for 5B events/year.
Summary
Partitioning splits tables internally for performance and maintenance. Range by date is classic for orders, logs, events. Queries must use partition key for pruning benefits. DROP PARTITION beats DELETE for retention at scale. Combine with sharding when single instance limits hit. CAP theorem frames consistency trade-offs across distributed stores.