Notification System
Design a notification system delivering email, SMS, push, and in-app alerts at scale — order shipped, OTP login, marketing campaigns.
Introduction
Design a notification system delivering email, SMS, push, and in-app alerts at scale — order shipped, OTP login, marketing campaigns. HLD decouples event producers from channel delivery with queues, templates, preference management, and provider failover.
Assume billions of notifications monthly, user opt-out preferences, rate limits per channel, and delivery tracking webhooks from Twilio/FCM/SendGrid.
Understanding the topic
Key concepts
- Event-driven: OrderShipped → Notification service routes by user preferences.
- Template engine with variables and localization.
- Channel adapters: EmailProvider, SmsProvider, PushProvider interface.
- Priority queues: OTP transactional beats marketing bulk.
- Idempotent send by (userId, eventId, channel) — no duplicate OTP storms.
- Delivery status webhooks update analytics and retry policy.
flowchart TBEvent --> KafkaKafka --> RouterRouter --> PushRouter --> EmailRouter --> SMS
Internal architecture
Architecture overview
flowchart TBEvent --> KafkaKafka --> RouterRouter --> PushRouter --> EmailRouter --> SMS
Step-by-step explanation
- Business services publish NotificationRequested to Kafka.
- Router loads user preferences + quiet hours from Redis cache.
- Fan-out to channel-specific SQS queues with priority.
- Workers render template → call SendGrid/Twilio/FCM API.
- Failed sends retry exponential backoff → DLQ for ops.
- Campaign batch: scheduler chunks 1M users into rate-limited batches.
Informative example
Kafka consumer routes notification to channel queues with preference check:
@Componentpublic class NotificationRouter {private final UserPreferenceService prefs;private final QueuePublisher queues;@KafkaListener(topics = "notifications.requested", groupId = "router")public void route(NotificationEvent event) {UserPreferences p = prefs.get(event.userId());if (event.priority() == Priority.TRANSACTIONAL || p.emailEnabled()) {queues.publish("email", EmailJob.from(event));}if (p.pushEnabled()) {queues.publish("push", PushJob.from(event));}if (event.requiresSms() && p.smsEnabled()) {queues.publish("sms", SmsJob.from(event));}}}@Servicepublic class EmailWorker {@SqsListener("email")public void send(EmailJob job) {sendGrid.send(render(job.template(), job.vars()), job.to());}}
OTP SMS highest priority separate queue. Unsubscribe link mandatory marketing email. Rate limit FCM per device.
Real-world use
Real-world use cases
- E-commerce order lifecycle emails and push.
- Banking fraud alert SMS immediate priority.
- Social mention push notifications batched.
- Healthcare appointment reminder multi-channel.
Best practices
- User preference center — legal opt-out compliance (CAN-SPAM, GDPR).
- Separate transactional vs marketing infrastructure and IPs.
- Template versioning and preview before campaign blast.
- Monitor provider bounce/complaint rates.
- Quiet hours respect timezone per user profile.
- Encrypt PII in queue payloads or reference by ID only.
Common mistakes
- Synchronous send in checkout API — latency and failure coupling.
- No dedup — user gets 5 identical push on retry.
- Marketing email from transactional IP — reputation damage.
- Missing unsubscribe — legal risk.
- Single provider no fallback — total channel outage.
Advanced interview questions
Q1BeginnerWhy queue notifications?
Q2BeginnerTransactional vs marketing notifications?
Q3IntermediateHow respect user preferences?
Q4IntermediatePrevent duplicate notifications on retry?
Q5AdvancedDesign notification for 10B messages/month.
Summary
Notification HLD is event-driven multi-channel delivery. Queues buffer spikes and enable retries per channel. Preferences and quiet hours enforced at router. Idempotency prevents duplicate alerts on retries. Separate transactional and marketing paths. Final case study integrates all HLD patterns in one interview flow.