Monitoring Redis
Redis exposes 100+ INFO fields — focus on used_memory, evicted_keys, instantaneous_ops_per_sec, connected_clients, master_repl_offset lag, and latest_fork_usec.
Introduction
Redis exposes 100+ INFO fields — focus on used_memory, evicted_keys, instantaneous_ops_per_sec, connected_clients, master_repl_offset lag, and latest_fork_usec. Export via redis_exporter to Prometheus; alert before users notice.
Dashboards: memory trend, hit rate (app metric), p99 command latency, replication lag, evictions per minute. SLOWLOG length growth correlates with deploy regressions.
Managed Redis adds cloud metrics — still validate application-side cache hit rate Redis alone cannot see.
Understanding the topic
Key concepts
- INFO memory/stats/replication/clients.
- redis_exporter → Prometheus → Grafana.
- SLOWLOG length and entries.
- Latency histogram LATENCY LATEST.
- Alert thresholds: memory 80%, evictions spike.
- Application cache hit rate complement.
Step-by-step explanation
- Exporter scrapes INFO periodically.
- Time-series stored in Prometheus.
- Alertmanager fires on threshold breach.
- On-call runbook links to redis-cli checks.
- Post-incident review updates dashboards.
Syntax reference
Common commands
- instantaneous_ops_per_sec — traffic spike detection.
- mem_fragmentation_ratio — defrag trigger.
- master_link_down_since_seconds — repl broken.
INFO statsINFO memoryINFO replicationSLOWLOG LENLATENCY LATEST
Informative example
Key metrics scrape script for cron until exporter deployed:
#!/bin/bashH=$(redis-cli INFO memory | awk -F: '/used_memory_human/{print $2}' | tr -d '\r')E=$(redis-cli INFO stats | awk -F: '/evicted_keys/{print $2}' | tr -d '\r')OPS=$(redis-cli INFO stats | awk -F: '/instantaneous_ops/{print $2}' | tr -d '\r')echo "memory=$H evicted=$E ops=$OPS" | logger -t redis-metrics
Replace with redis_exporter Helm chart in Kubernetes. Tag dashboards by environment and cluster name.
Real-world use
Real-world use cases
- On-call alert memory >80%.
- Detect replication lag during network event.
- Post-deploy slowlog regression.
- Capacity plan from ops/sec trend.
- SLA reporting for platform team.
Best practices
- Redis exporter on every instance.
- Alert evicted_keys rate not just memory.
- Track app cache hit rate alongside.
- Log correlation id on cache miss spikes.
- Weekly review LATENCY DOCTOR in staging.
- Runbook links from alert to redis-cli steps.
Common mistakes
- Monitoring only CPU not memory.
- No alert on replication disconnect.
- Ignoring evictions as normal.
- Dashboard without app hit rate context.
Advanced interview questions
Q1BeginnerKey INFO sections?
Q2Beginnerevicted_keys meaning?
Q3IntermediateDetect slow commands?
Q4IntermediateReplication lag metric?
Q5AdvancedRedis monitoring stack?
Summary
INFO memory/stats/replication — core scrape set.