High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 41

    Notification System

    Design a notification system delivering email, SMS, push, and in-app alerts at scale — order shipped, OTP login, marketing campaigns.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    Design a notification system delivering email, SMS, push, and in-app alerts at scale — order shipped, OTP login, marketing campaigns. HLD decouples event producers from channel delivery with queues, templates, preference management, and provider failover.

    Assume billions of notifications monthly, user opt-out preferences, rate limits per channel, and delivery tracking webhooks from Twilio/FCM/SendGrid.

    Understanding the topic

    Key concepts

    • Event-driven: OrderShipped → Notification service routes by user preferences.
    • Template engine with variables and localization.
    • Channel adapters: EmailProvider, SmsProvider, PushProvider interface.
    • Priority queues: OTP transactional beats marketing bulk.
    • Idempotent send by (userId, eventId, channel) — no duplicate OTP storms.
    • Delivery status webhooks update analytics and retry policy.
    text
    flowchart TB
    Event --> Kafka
    Kafka --> Router
    Router --> Push
    Router --> Email
    Router --> SMS

    Internal architecture

    Architecture overview

    text
    flowchart TB
    Event --> Kafka
    Kafka --> Router
    Router --> Push
    Router --> Email
    Router --> SMS

    Step-by-step explanation

    1. Business services publish NotificationRequested to Kafka.
    2. Router loads user preferences + quiet hours from Redis cache.
    3. Fan-out to channel-specific SQS queues with priority.
    4. Workers render template → call SendGrid/Twilio/FCM API.
    5. Failed sends retry exponential backoff → DLQ for ops.
    6. Campaign batch: scheduler chunks 1M users into rate-limited batches.

    Informative example

    Kafka consumer routes notification to channel queues with preference check:

    java
    @Component
    public class NotificationRouter {
    private final UserPreferenceService prefs;
    private final QueuePublisher queues;
    @KafkaListener(topics = "notifications.requested", groupId = "router")
    public void route(NotificationEvent event) {
    UserPreferences p = prefs.get(event.userId());
    if (event.priority() == Priority.TRANSACTIONAL || p.emailEnabled()) {
    queues.publish("email", EmailJob.from(event));
    }
    if (p.pushEnabled()) {
    queues.publish("push", PushJob.from(event));
    }
    if (event.requiresSms() && p.smsEnabled()) {
    queues.publish("sms", SmsJob.from(event));
    }
    }
    }
    @Service
    public class EmailWorker {
    @SqsListener("email")
    public void send(EmailJob job) {
    sendGrid.send(render(job.template(), job.vars()), job.to());
    }
    }

    OTP SMS highest priority separate queue. Unsubscribe link mandatory marketing email. Rate limit FCM per device.

    Real-world use

    Real-world use cases

    • E-commerce order lifecycle emails and push.
    • Banking fraud alert SMS immediate priority.
    • Social mention push notifications batched.
    • Healthcare appointment reminder multi-channel.

    Best practices

    • User preference center — legal opt-out compliance (CAN-SPAM, GDPR).
    • Separate transactional vs marketing infrastructure and IPs.
    • Template versioning and preview before campaign blast.
    • Monitor provider bounce/complaint rates.
    • Quiet hours respect timezone per user profile.
    • Encrypt PII in queue payloads or reference by ID only.

    Common mistakes

    • Synchronous send in checkout API — latency and failure coupling.
    • No dedup — user gets 5 identical push on retry.
    • Marketing email from transactional IP — reputation damage.
    • Missing unsubscribe — legal risk.
    • Single provider no fallback — total channel outage.

    Advanced interview questions

    Q1BeginnerWhy queue notifications?
    Decouple producers, absorb spikes, retry failures, scale workers independently.
    Q2BeginnerTransactional vs marketing notifications?
    Transactional critical OTP/receipts high priority; marketing bulk rate-limited separate queues.
    Q3IntermediateHow respect user preferences?
    Store channel opt-in per user; router checks before enqueue; quiet hours by timezone.
    Q4IntermediatePrevent duplicate notifications on retry?
    Idempotency key (userId, eventId, channel) stored with TTL skip resend.
    Q5AdvancedDesign notification for 10B messages/month.
    Kafka ingest, priority queues, auto-scale workers per channel, multi-provider failover, template service, preference Redis, analytics ClickHouse, campaign scheduler chunk 100k, DLQ ops dashboard.

    Summary

    Notification HLD is event-driven multi-channel delivery. Queues buffer spikes and enable retries per channel. Preferences and quiet hours enforced at router. Idempotency prevents duplicate alerts on retries. Separate transactional and marketing paths. Final case study integrates all HLD patterns in one interview flow.

    Ready to mark this lesson complete?Track your journey across the entire course.