High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 24

    Event Streaming

    Event streaming treats data as continuous flows of events processed in near real-time — beyond simple queue consumption.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    Event streaming treats data as continuous flows of events processed in near real-time — beyond simple queue consumption. Kafka Streams, Apache Flink, and ksqlDB enable windowed aggregations, joins across streams, and materialized views updated as events arrive.

    HLD use cases include fraud detection, real-time analytics dashboards, inventory sync, and CDC-driven read models. Streaming complements batch (Spark overnight) for latency-sensitive paths.

    This lesson connects log-based Kafka foundations to stream processing topology and exactly-once semantics.

    Understanding the topic

    Key concepts

    • Stream: unbounded sequence of events ordered by time/partition.
    • Windowing: tumbling, sliding, session windows for aggregations.
    • Stateful processing: local RocksDB stores keyed state with changelog topics.
    • Event-time vs processing-time — watermarks handle late events.
    • CQRS projection: stream processor builds read-optimized Elasticsearch index.
    • Lambda vs Kappa: batch+speed layers vs single stream replay for all.
    text
    flowchart LR
    Source --> Stream
    Stream --> Processor
    Processor --> Sink

    Internal architecture

    Architecture overview

    text
    flowchart LR
    Source --> Stream
    Stream --> Processor
    Processor --> Sink

    Step-by-step explanation

    1. Sources: app events, DB CDC (Debezium), IoT sensors → Kafka topics.
    2. Stream processor (Flink) filters, aggregates, joins enrichment stream.
    3. Sinks: Elasticsearch, Redis materialized view, alerting webhook.
    4. Changelog compaction for latest state per key (KTable).
    5. Separate processing guarantee tiers: at-least-once alerts vs exactly-once billing.
    6. Schema registry enforces event contracts across producers/consumers.

    Informative example

    Kafka Streams count orders per minute — compact topology concept in Java:

    java
    @Configuration
    public class OrderStreamConfig {
    @Bean
    public KStream<String, OrderEvent> orderStream(StreamsBuilder builder) {
    KStream<String, OrderEvent> orders = builder.stream("orders");
    orders
    .groupBy((key, order) -> order.region())
    .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(1)))
    .count()
    .toStream()
    .map((windowedKey, count) ->
    KeyValue.pair(windowedKey.key() + ":" + windowedKey.window().start(),
    new RegionOrderCount(windowedKey.key(), count)))
    .to("orders-per-minute-by-region");
    return orders;
    }
    }

    Interview: mention Flink for complex event processing (CEP) and large state; Kafka Streams for embedded microservice aggregations.

    Real-world use

    Real-world use cases

    • Fintech fraud: join payment stream with device fingerprint stream real-time.
    • E-commerce: live sales dashboard by region.
    • OTT: concurrent viewer count per show.
    • Ride-hailing: surge pricing from demand/supply stream join.

    Best practices

    • Define event schema and versioning upfront.
    • Monitor consumer lag and state store size on stream apps.
    • Handle late events with grace period in windows.
    • Idempotent sinks — ES upsert by event ID.
    • Test replay from earliest offset after logic change.
    • Separate critical and analytics streams physically if needed.

    Common mistakes

    • Unbounded state without retention — RocksDB grows forever.
    • Processing-time windows when event-time skew matters.
    • Joining streams without co-partitioning same key.
    • No plan for redeploy state migration.
    • Exactly-once everywhere — unnecessary complexity and cost.

    Advanced interview questions

    Q1BeginnerWhat is event streaming?
    Continuous processing of event logs in near real-time for aggregations, joins, and projections.
    Q2BeginnerKafka vs stream processor?
    Kafka stores/logs events; stream processor (Flink) computes windowed aggregations and joins over streams.
    Q3IntermediateWhat is CDC in streaming?
    Change Data Capture publishes DB row changes to Kafka for downstream sync.
    Q4IntermediateEvent-time vs processing-time?
    Event-time uses when event occurred; processing-time when processed — watermarks handle disorder.
    Q5AdvancedDesign real-time fraud scoring.
    Payment events to Kafka, Flink join user profile KTable, rules + ML feature window 5min, alert sink, idempotent by txnId, lag SLA 2s, DLQ for bad schema.

    Summary

    Event streaming processes logs continuously for real-time insights. Windowing and stateful operators power aggregations and joins. CDC syncs OLTP databases to read models via Kafka. Choose Flink vs Kafka Streams by complexity and ops maturity. Consistency models govern how stale projections may be. Distributed transactions address cross-service atomicity next.

    Ready to mark this lesson complete?Track your journey across the entire course.