Redis Tutorial 0/42 lessons ~6 min read Lesson 40“In-memory data structures to distributed systems”

Troubleshooting Redis

Ninety percent of Redis incidents fall into eight buckets: OOM/eviction, hot keys, slow commands, replication lag, stampede, penetration, avalanche, and fragmentation.

Course progress0%

Focus

10 guided sections

Practice signal

Examples included

Career prep

Interview Q&A included

Introduction

Ninety percent of Redis incidents fall into eight buckets: OOM/eviction, hot keys, slow commands, replication lag, stampede, penetration, avalanche, and fragmentation. Systematic diagnosis beats random CONFIG changes.

Start SLOWLOG and INFO memory on every latency ticket. Correlate with deploy time, traffic spike, and backup window (fork latency).

Document fixes in postmortem — same hot key will return next sale season.

Understanding the topic

Key concepts

Cache stampede — simultaneous expiry thundering herd.
Cache penetration — queries for non-existent keys.
Cache avalanche — mass expiry same time.
Hot key — single key overloads one shard.
Big key — DEL/GET blocks event loop.
Fragmentation — RSS creep without more data.

Step-by-step explanation

Symptom: latency, errors, evictions.
Check SLOWLOG, INFO, recent changes.
Classify: memory, hot key, network, fork.
Mitigate: TTL jitter, local cache, UNLINK.
Long-term: shard, pipeline, code fix.

Syntax reference

Common commands

--hotkeys samples commands — not free.
Match incident timestamp to BGSAVE schedule.
Check client output buffer disconnects.

bash

SLOWLOG GET 20
redis-cli --hotkeys -i 0.1
INFO memory
INFO stats
LATENCY DOCTOR
MEMORY DOCTOR

Informative example

Hot key incident playbook — identify and mitigate:

bash

# 1. Confirm hot key
redis-cli --hotkeys -i 0.1

# 2. Check key size
redis-cli MEMORY USAGE viral:product:sku

# 3. Mitigate: app deploy local L1 cache for this key
# 4. Long-term: shard read replicas or duplicate read key

Celebrity product drop = hot key. Pre-warm and local cache before event. Communicate with product on key design.

Real-world use

Real-world use cases

P99 spike after marketing email.
OOM kill pod restarts.
FLUSHDB accident recovery.
Cluster MOVED storm misconfigured client.
Session stampede after deploy TTL change.

Best practices

Runbook per incident class.
Never KEYS * in prod debugging.
Jitter TTL prevent avalanche.
Bloom filter negative cache penetration.
UNLINK big keys during cleanup.
Game day hot key simulation.

Common mistakes

Restart Redis without diagnosis — loses evidence.
Increase maxmemory without fixing TTL leak.
MONITOR in prod adding load.
Blaming network without checking slowlog.

Advanced interview questions

Q1BeginnerCache stampede?

Many requests miss expired hot key simultaneously hammer DB.

Q2BeginnerCache penetration?

Attack or bug requests non-existent keys bypassing cache.

Q3IntermediateFix hot key?

Local L1, read replica, duplicate key, split value, or pre-warm.

Q4IntermediateSudden latency during backup?

BGSAVE fork COW — move backup to replica schedule off-peak.

Q5AdvancedSystematic Redis incident debug?

Timeline, SLOWLOG, INFO memory/stats, hotkeys, deploy diff, repl lag, client buffers, then targeted fix.

Summary

SLOWLOG + INFO first on every incident.

Ready to mark this lesson complete?Track your journey across the entire course.