Backup & Recovery
Untested backups are Schrödinger's disaster recovery — RDB snapshots, AOF files, and managed cloud snapshots each need quarterly restore drills verifying key counts and applicat…
Introduction
Untested backups are Schrödinger's disaster recovery — RDB snapshots, AOF files, and managed cloud snapshots each need quarterly restore drills verifying key counts and application smoke tests.
Best practice: BGSAVE on replica, encrypt, upload to object storage with versioning, document RTO/RPO. Point-in-time for Redis alone is approximate — combine with application-level audit if needed.
Restore to isolated instance first — never overwrite prod without validation.
Understanding the topic
Key concepts
- RDB — point-in-time binary snapshot.
- AOF — replay write log on restore.
- Managed automatic snapshots (ElastiCache).
- Off-site immutable copy (S3 Object Lock).
- redis-check-rdb / redis-check-aof validation.
- RTO/RPO documented per service tier.
Step-by-step explanation
- Schedule BGSAVE or rely on cloud snapshot.
- Copy artifact to durable storage.
- Validate with check tools.
- Restore to temp Redis instance.
- Compare DBSIZE, spot keys, run app tests.
- Cutover or discard.
Syntax reference
Common commands
- Test restore ≠ listing backup exists.
- Encrypt at rest in S3.
- Scrub PII when cloning prod to staging.
redis-cli BGSAVEredis-check-rdb dump.rdb# Restore testredis-server --dbfilename dump.rdb --dir /tmp/restore-testredis-cli -p 6379 DBSIZE
Informative example
Automated backup script outline:
#!/bin/bashredis-cli -h replica.internal BGSAVEsleep 30scp replica:/var/lib/redis/dump.rdb ./backup-$(date +%F).rdbredis-check-rdb backup-$(date +%F).rdbaws s3 cp backup-$(date +%F).rdb s3://dr/redis/ --sse AES256
Run from replica. Alert if redis-check-rdb fails. Keep 30-day retention minimum for compliance tiers.
Real-world use
Real-world use cases
- Disaster recovery regional outage.
- Accidental FLUSHDB recovery from last RDB.
- Staging refresh from sanitized prod.
- Compliance audit trail of backups.
- Pre-upgrade snapshot rollback point.
Best practices
- Quarterly restore drill with checklist.
- Backup from replica not master.
- Encrypt and version off-site copies.
- Separate AWS account for backup bucket.
- Document who approves prod restore.
- Monitor backup job success alerts.
Common mistakes
- Backups never restored in test.
- Only on-primary BGSAVE under load.
- No encryption on backup bucket.
- Restoring FLUSHALL over prod without isolation.
Advanced interview questions
Q1BeginnerRDB backup command?
Q2BeginnerValidate RDB file?
Q3IntermediateRPO with daily RDB?
Q4IntermediateBackup without master impact?
Q5AdvancedDR runbook for Redis?
Summary
Automate RDB/snapshot to off-site storage.