Rate Limiting
Rate limiting caps requests per client, user, IP, or API key within a time window — protecting services from abuse, accidental loops, and DDoS.
Introduction
Rate limiting caps requests per client, user, IP, or API key within a time window — protecting services from abuse, accidental loops, and DDoS. Algorithms: fixed window, sliding window log, token bucket, leaky bucket. Redis INCR + EXPIRE or dedicated libraries implement counters at gateway scale.
HLD pairs rate limits with AuthN tiers (free vs premium quotas) and graceful 429 responses including Retry-After header. Different limits for read vs write endpoints.
This lesson covers algorithm trade-offs, distributed counting, and bypass paths for health checks.
Understanding the topic
Key concepts
- Fixed window: 100 req/min per user — simple, boundary burst at window edge.
- Sliding window: smoother limit using rolling time buckets in Redis.
- Token bucket: allows bursts up to bucket size, steady refill rate.
- Leaky bucket: smooth output rate — shapes traffic.
- Global vs per-endpoint limits — login stricter than public catalog.
- 429 Too Many Requests + Retry-After seconds.
flowchart LRClient --> GW[Gateway]GW --> RL[Rate Limiter Redis]RL -->|under limit| APIRL -->|429| Client
Internal architecture
Architecture overview
flowchart LRClient --> GW[Gateway]GW --> RL[Rate Limiter Redis]RL -->|under limit| APIRL -->|429| Client
Step-by-step explanation
- API Gateway Redis rate limiter key = userId or API key.
- Free tier 100 rpm; premium 10k rpm — from JWT plan claim.
- Separate limiter for expensive endpoints (/search, /export).
- IP limit for unauthenticated endpoints anti-scraping.
- Whitelist internal service CIDR bypass with mTLS identity still.
- Alert on sustained 429 rate — product or attack signal.
Informative example
Redis sliding window rate limiter used in Spring filter:
@Componentpublic class RateLimitFilter extends OncePerRequestFilter {private final StringRedisTemplate redis;private static final int LIMIT = 100;private static final Duration WINDOW = Duration.ofMinutes(1);public RateLimitFilter(StringRedisTemplate redis) { this.redis = redis; }@Overrideprotected void doFilterInternal(HttpServletRequest req, HttpServletResponse res,FilterChain chain) throws ServletException, IOException {String key = "rl:" + resolveClientKey(req);long now = System.currentTimeMillis();String zkey = key + ":z";redis.opsForZSet().removeRangeByScore(zkey, 0, now - WINDOW.toMillis());Long count = redis.opsForZSet().zCard(zkey);if (count != null && count >= LIMIT) {res.setStatus(429);res.setHeader("Retry-After", "60");return;}redis.opsForZSet().add(zkey, UUID.randomUUID().toString(), now);redis.expire(zkey, WINDOW);chain.doFilter(req, res);}}
Gateway-level limiting protects all services. Use token bucket for burst-friendly mobile clients.
Real-world use
Real-world use cases
- Public fintech API tiered pricing by request quota.
- Login endpoint brute-force protection 5 attempts/min/IP.
- Social posting limits anti-spam.
- Partner webhook delivery throttle outbound.
Best practices
- Return clear 429 with Retry-After.
- Key limiter by authenticated userId when possible — not shared NAT IP.
- Different limits read vs write vs auth.
- Monitor limit hit rate per tier.
- Fail open vs closed decision documented — usually open for availability with alert.
- Combine with WAF and CAPTCHA on abuse patterns.
Common mistakes
- Rate limit only by IP — punishes corporate NAT users.
- No bypass for health checks blocked during incident.
- Fixed window without explaining edge burst double traffic.
- Limiting after expensive work done — check early in filter chain.
- Same limit globally for lightweight and heavy endpoints.
Advanced interview questions
Q1BeginnerWhy rate limit APIs?
Q2BeginnerToken bucket vs fixed window?
Q3IntermediateWhere implement rate limiting?
Q4IntermediateDistributed rate limiting challenge?
Q5AdvancedDesign limits for public maps API.
Summary
Rate limiting protects services from overload and abuse. Redis-backed counters enable distributed gateway limits. Token bucket balances steady rate with burst tolerance. Tier limits by plan; stricter on auth and write endpoints. 429 responses should include Retry-After guidance. Monitoring and logging complete the reliability picture.