High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 18

    Partitioning

    Partitioning divides one logical table into smaller physical pieces (partitions) within a single database instance — by range (date), list (region), or hash.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    Partitioning divides one logical table into smaller physical pieces (partitions) within a single database instance — by range (date), list (region), or hash. Unlike sharding across servers, partitioning stays on one node (or one shard) but improves query pruning, maintenance, and archival.

    PostgreSQL declarative partitioning and MySQL PARTITION BY RANGE are common. HLD interviews use partitioning for time-series logs, orders by month, and dropping old data instantly via DROP PARTITION.

    This lesson distinguishes partitioning from sharding and when both apply together.

    Understanding the topic

    Key concepts

    • Range partitioning: created_at monthly — queries with date filter scan one partition.
    • List partitioning: country_code IN ('US','CA') → partition americas.
    • Hash partitioning: even split when no natural range — limited pruning benefit.
    • Partition pruning: optimizer skips irrelevant partitions — critical for performance.
    • Maintenance: detach/drop old partitions cheaper than DELETE millions of rows.
    • Partitioning vs sharding: partition = single DB; shard = multiple DB instances.
    text
    flowchart TB
    Table --> P2024[Partition 2024]
    Table --> P2025[Partition 2025]
    Table --> P2026[Partition 2026]

    Internal architecture

    Architecture overview

    text
    flowchart TB
    Table --> P2024[Partition 2024]
    Table --> P2025[Partition 2025]
    Table --> P2026[Partition 2026]

    Step-by-step explanation

    1. Hot recent partition on fast storage; cold partitions on cheaper tier or archived to S3.
    2. Application queries include partition key in WHERE for pruning.
    3. Global indexes across partitions may be limited — design PK including partition key.
    4. Scheduled job creates future partitions before month rollover.
    5. Retention: DROP PARTITION older than 7 years for compliance.
    6. At extreme scale: partition within each shard (shard by user, partition by month).

    Informative example

    PostgreSQL range partitioning for orders by month — SQL DDL and Spring query hint:

    java
    -- DDL (run via migration)
    CREATE TABLE orders (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    total NUMERIC(12,2)
    ) PARTITION BY RANGE (created_at);
    CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
    // Repository query must filter created_at for pruning
    @Query("""
    SELECT o FROM Order o
    WHERE o.userId = :userId
    AND o.createdAt >= :from AND o.createdAt < :to
    """)
    List<Order> findUserOrdersInRange(String userId, Instant from, Instant to);

    Interview: 'Partition orders by month for query speed and cheap retention; shard by userId when single instance fills.'

    Real-world use

    Real-world use cases

    • E-commerce order history queries last 90 days — prune old partitions.
    • Banking audit logs: monthly partitions with 7-year retention policy.
    • IoT telemetry time-series in PostgreSQL/Timescale before moving cold to S3.
    • Healthcare event logs partitioned by date for HIPAA retention deletes.

    Best practices

    • Always include partition key in queries needing performance.
    • Automate partition creation — missing partition breaks inserts.
    • Monitor partition sizes — split when one partition too large.
    • Test EXPLAIN plans confirm partition pruning.
    • Combine with indexes on partition key + filter columns.
    • Archive detached partitions to object storage for cost.

    Common mistakes

    • Queries without partition key — full table scan all partitions.
    • Forgetting to pre-create next month's partition.
    • Primary key not including partition key — routing issues in some engines.
    • Confusing table partitioning with application-level sharding only.
    • Too many tiny partitions — metadata overhead.

    Advanced interview questions

    Q1BeginnerWhat is table partitioning?
    Splitting one logical table into physical segments by range, list, or hash within a database.
    Q2BeginnerPartitioning vs sharding?
    Partitioning is within one DB instance; sharding spreads across multiple instances.
    Q3IntermediateWhy partition by date for logs?
    Query pruning on recent data and cheap retention via DROP PARTITION.
    Q4IntermediateWhat is partition pruning?
    Optimizer skips scanning partitions not matching query filters on partition key.
    Q5AdvancedDesign storage for 5B events/year.
    Monthly range partitions, 24-month hot in PostgreSQL, detach cold to S3 Parquet, query via Athena, shard if single node exceeds TB — partition key event_time.

    Summary

    Partitioning splits tables internally for performance and maintenance. Range by date is classic for orders, logs, events. Queries must use partition key for pruning benefits. DROP PARTITION beats DELETE for retention at scale. Combine with sharding when single instance limits hit. CAP theorem frames consistency trade-offs across distributed stores.

    Ready to mark this lesson complete?Track your journey across the entire course.