High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 36“Scalable systems, HLD interviews & case studies”

Chat Application

Design a chat application like WhatsApp or Slack — 1:1 and group messaging, online presence, delivery receipts, and message history.

Course progress0%

Focus

10 guided sections

Practice signal

Examples included

Career prep

Interview Q&A included

Introduction

Design a chat application like WhatsApp or Slack — 1:1 and group messaging, online presence, delivery receipts, and message history. Real-time HLD centers on WebSockets, message ordering per conversation, fan-out for groups, and mobile push for offline users.

Assume millions of concurrent connections, billions of messages stored, at-least-once delivery with client dedup, and end-to-end encryption as optional advanced topic.

Understanding the topic

Key concepts

WebSocket gateway horizontal scale with connection stickiness or shared pub/sub.
Message store: Cassandra partition by conversation_id + time UUID.
Delivery flow: sent → delivered → read receipts via separate lightweight events.
Presence: heartbeat + Redis TTL online set; last seen timestamp.
Group chat: fan-out on write for small groups; fan-out on read for large channels.
Push notifications (APNs/FCM) when recipient offline.

text

flowchart TB
  Client --> WS[WebSocket Gateway]
  WS --> ChatSvc
  ChatSvc --> Kafka
  Kafka --> Presence
  ChatSvc --> Cassandra

Internal architecture

Architecture overview

text

flowchart TB
  Client --> WS[WebSocket Gateway]
  WS --> ChatSvc
  ChatSvc --> Kafka
  Kafka --> Presence
  ChatSvc --> Cassandra

Step-by-step explanation

Client WSS → WebSocket Gateway cluster → Chat Service.
Send message → persist Cassandra → publish to Kafka topic conversationId.
Online recipients: gateway subscribed to user channel via Redis pub/sub or Kafka consumer pushes WS frame.
Offline: Notification service FCM push with message preview policy.
Media messages: upload to S3 presigned URL; message body stores URL reference.
History sync: paginated GET /conversations/{id}/messages?before=cursor.

Informative example

Message send API and Kafka fan-out to WebSocket delivery workers:

java

@RestController
@RequestMapping("/api/v1/conversations/{cid}/messages")
public class MessageController {
    private final MessageService messages;

    public MessageController(MessageService messages) { this.messages = messages; }

    @PostMapping
    public MessageDto send(@PathVariable String cid, @RequestBody SendMessageRequest req,
                           @AuthenticationPrincipal Jwt jwt) {
        return messages.send(cid, jwt.getSubject(), req.body());
    }
}

@Service
public class MessageService {
    private final MessageRepository cassandra;
    private final KafkaTemplate<String, ChatMessageEvent> kafka;

    public MessageDto send(String conversationId, String senderId, String body) {
        Message msg = cassandra.save(Message.create(conversationId, senderId, body));
        kafka.send("chat.messages", conversationId,
            new ChatMessageEvent(msg.id(), conversationId, senderId, body, msg.sentAt()));
        return MessageDto.from(msg);
    }
}

Partition Kafka by conversationId for ordering. WS gateway scales on connection count — separate from REST API.

Real-world use

Real-world use cases

Enterprise Slack-like team collaboration.
Telehealth secure messaging HIPAA audit.
In-app e-commerce buyer-seller chat.
Game guild chat with low latency.

Best practices

Idempotent send with client-generated message UUID.
Paginate history — never load full conversation.
Backpressure on gateway if client slow consumer.
Encrypt TLS everywhere; E2EE optional product decision.
Moderation pipeline for abuse reporting async.
Load test connection count per gateway pod.

Common mistakes

Polling HTTP instead of WebSocket — battery and latency fail.
Fan-out on write to 10k member channel — write amplification.
No ordering guarantee per conversation.
Storing large media in message row — bloat.
Single gateway SPOF without horizontal scale plan.

Advanced interview questions

Q1BeginnerWebSocket vs long polling for chat?

WebSocket full-duplex low latency; long polling fallback only — primary real-time path WS.

Q2BeginnerHow store chat messages?

Wide-column store partitioned by conversationId + time-ordered messageId for efficient range queries.

Q3IntermediateFan-out on write vs read for groups?

Write fan-out small groups; read fan-out large channels (WhatsApp broadcast lists hybrid).

Q4IntermediateOnline presence implementation?

Heartbeats update Redis key user:online TTL; friends query MGET or pub/sub presence change.

Q5AdvancedDesign WhatsApp-scale 2B users.

WS gateway geo sharded, Cassandra multi-DC, Kafka message pipeline, E2EE client-side keys, push FCM/APNs, media S3, group size tiers different fan-out, CDN not for messages.

Summary

Chat HLD combines WebSockets, durable message store, and push. Partition messages by conversation for ordering and queries. Fan-out strategy depends on group size. Presence via Redis TTL heartbeats. Kafka bridges persistence to live delivery workers. Food delivery adds dispatch geo-matching complexity.

Ready to mark this lesson complete?Track your journey across the entire course.