High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 32“Scalable systems, HLD interviews & case studies”

Rate Limiting

Rate limiting caps requests per client, user, IP, or API key within a time window — protecting services from abuse, accidental loops, and DDoS.

Course progress0%

Focus

10 guided sections

Practice signal

Examples included

Career prep

Interview Q&A included

Introduction

Rate limiting caps requests per client, user, IP, or API key within a time window — protecting services from abuse, accidental loops, and DDoS. Algorithms: fixed window, sliding window log, token bucket, leaky bucket. Redis INCR + EXPIRE or dedicated libraries implement counters at gateway scale.

HLD pairs rate limits with AuthN tiers (free vs premium quotas) and graceful 429 responses including Retry-After header. Different limits for read vs write endpoints.

This lesson covers algorithm trade-offs, distributed counting, and bypass paths for health checks.

Understanding the topic

Key concepts

Fixed window: 100 req/min per user — simple, boundary burst at window edge.
Sliding window: smoother limit using rolling time buckets in Redis.
Token bucket: allows bursts up to bucket size, steady refill rate.
Leaky bucket: smooth output rate — shapes traffic.
Global vs per-endpoint limits — login stricter than public catalog.
429 Too Many Requests + Retry-After seconds.

text

flowchart LR
  Client --> GW[Gateway]
  GW --> RL[Rate Limiter Redis]
  RL -->|under limit| API
  RL -->|429| Client

Internal architecture

Architecture overview

text

flowchart LR
  Client --> GW[Gateway]
  GW --> RL[Rate Limiter Redis]
  RL -->|under limit| API
  RL -->|429| Client

Step-by-step explanation

API Gateway Redis rate limiter key = userId or API key.
Free tier 100 rpm; premium 10k rpm — from JWT plan claim.
Separate limiter for expensive endpoints (/search, /export).
IP limit for unauthenticated endpoints anti-scraping.
Whitelist internal service CIDR bypass with mTLS identity still.
Alert on sustained 429 rate — product or attack signal.

Informative example

Redis sliding window rate limiter used in Spring filter:

java

@Component
public class RateLimitFilter extends OncePerRequestFilter {
    private final StringRedisTemplate redis;
    private static final int LIMIT = 100;
    private static final Duration WINDOW = Duration.ofMinutes(1);

    public RateLimitFilter(StringRedisTemplate redis) { this.redis = redis; }

    @Override
    protected void doFilterInternal(HttpServletRequest req, HttpServletResponse res,
                                    FilterChain chain) throws ServletException, IOException {
        String key = "rl:" + resolveClientKey(req);
        long now = System.currentTimeMillis();
        String zkey = key + ":z";
        redis.opsForZSet().removeRangeByScore(zkey, 0, now - WINDOW.toMillis());
        Long count = redis.opsForZSet().zCard(zkey);
        if (count != null && count >= LIMIT) {
            res.setStatus(429);
            res.setHeader("Retry-After", "60");
            return;
        }
        redis.opsForZSet().add(zkey, UUID.randomUUID().toString(), now);
        redis.expire(zkey, WINDOW);
        chain.doFilter(req, res);
    }
}

Gateway-level limiting protects all services. Use token bucket for burst-friendly mobile clients.

Real-world use

Real-world use cases

Public fintech API tiered pricing by request quota.
Login endpoint brute-force protection 5 attempts/min/IP.
Social posting limits anti-spam.
Partner webhook delivery throttle outbound.

Best practices

Return clear 429 with Retry-After.
Key limiter by authenticated userId when possible — not shared NAT IP.
Different limits read vs write vs auth.
Monitor limit hit rate per tier.
Fail open vs closed decision documented — usually open for availability with alert.
Combine with WAF and CAPTCHA on abuse patterns.

Common mistakes

Rate limit only by IP — punishes corporate NAT users.
No bypass for health checks blocked during incident.
Fixed window without explaining edge burst double traffic.
Limiting after expensive work done — check early in filter chain.
Same limit globally for lightweight and heavy endpoints.

Advanced interview questions

Q1BeginnerWhy rate limit APIs?

Prevent abuse, protect downstream resources, ensure fair usage across tenants.

Q2BeginnerToken bucket vs fixed window?

Token bucket allows controlled bursts; fixed window simpler but can allow 2× burst at boundary.

Q3IntermediateWhere implement rate limiting?

API Gateway edge primarily; service-level for expensive operations second line.

Q4IntermediateDistributed rate limiting challenge?

Counters must be shared (Redis) across gateway instances — local in-memory counts fail.

Q5AdvancedDesign limits for public maps API.

API key tiers 1k/10k/100k qps, token bucket burst 2×, 429 + Retry-After, Redis cluster counter, per-endpoint geocode stricter, anomaly detection block.

Summary

Rate limiting protects services from overload and abuse. Redis-backed counters enable distributed gateway limits. Token bucket balances steady rate with burst tolerance. Tier limits by plan; stricter on auth and write endpoints. 429 responses should include Retry-After guidance. Monitoring and logging complete the reliability picture.

Ready to mark this lesson complete?Track your journey across the entire course.