Event Streaming
Event streaming treats data as continuous flows of events processed in near real-time — beyond simple queue consumption.
Introduction
Event streaming treats data as continuous flows of events processed in near real-time — beyond simple queue consumption. Kafka Streams, Apache Flink, and ksqlDB enable windowed aggregations, joins across streams, and materialized views updated as events arrive.
HLD use cases include fraud detection, real-time analytics dashboards, inventory sync, and CDC-driven read models. Streaming complements batch (Spark overnight) for latency-sensitive paths.
This lesson connects log-based Kafka foundations to stream processing topology and exactly-once semantics.
Understanding the topic
Key concepts
- Stream: unbounded sequence of events ordered by time/partition.
- Windowing: tumbling, sliding, session windows for aggregations.
- Stateful processing: local RocksDB stores keyed state with changelog topics.
- Event-time vs processing-time — watermarks handle late events.
- CQRS projection: stream processor builds read-optimized Elasticsearch index.
- Lambda vs Kappa: batch+speed layers vs single stream replay for all.
flowchart LRSource --> StreamStream --> ProcessorProcessor --> Sink
Internal architecture
Architecture overview
flowchart LRSource --> StreamStream --> ProcessorProcessor --> Sink
Step-by-step explanation
- Sources: app events, DB CDC (Debezium), IoT sensors → Kafka topics.
- Stream processor (Flink) filters, aggregates, joins enrichment stream.
- Sinks: Elasticsearch, Redis materialized view, alerting webhook.
- Changelog compaction for latest state per key (KTable).
- Separate processing guarantee tiers: at-least-once alerts vs exactly-once billing.
- Schema registry enforces event contracts across producers/consumers.
Informative example
Kafka Streams count orders per minute — compact topology concept in Java:
@Configurationpublic class OrderStreamConfig {@Beanpublic KStream<String, OrderEvent> orderStream(StreamsBuilder builder) {KStream<String, OrderEvent> orders = builder.stream("orders");orders.groupBy((key, order) -> order.region()).windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(1))).count().toStream().map((windowedKey, count) ->KeyValue.pair(windowedKey.key() + ":" + windowedKey.window().start(),new RegionOrderCount(windowedKey.key(), count))).to("orders-per-minute-by-region");return orders;}}
Interview: mention Flink for complex event processing (CEP) and large state; Kafka Streams for embedded microservice aggregations.
Real-world use
Real-world use cases
- Fintech fraud: join payment stream with device fingerprint stream real-time.
- E-commerce: live sales dashboard by region.
- OTT: concurrent viewer count per show.
- Ride-hailing: surge pricing from demand/supply stream join.
Best practices
- Define event schema and versioning upfront.
- Monitor consumer lag and state store size on stream apps.
- Handle late events with grace period in windows.
- Idempotent sinks — ES upsert by event ID.
- Test replay from earliest offset after logic change.
- Separate critical and analytics streams physically if needed.
Common mistakes
- Unbounded state without retention — RocksDB grows forever.
- Processing-time windows when event-time skew matters.
- Joining streams without co-partitioning same key.
- No plan for redeploy state migration.
- Exactly-once everywhere — unnecessary complexity and cost.
Advanced interview questions
Q1BeginnerWhat is event streaming?
Q2BeginnerKafka vs stream processor?
Q3IntermediateWhat is CDC in streaming?
Q4IntermediateEvent-time vs processing-time?
Q5AdvancedDesign real-time fraud scoring.
Summary
Event streaming processes logs continuously for real-time insights. Windowing and stateful operators power aggregations and joins. CDC syncs OLTP databases to read models via Kafka. Choose Flink vs Kafka Streams by complexity and ops maturity. Consistency models govern how stale projections may be. Distributed transactions address cross-service atomicity next.