High-Level Design Tutorial 0/42 lessons ~6 min read Lesson 32

    Rate Limiting

    Rate limiting caps requests per client, user, IP, or API key within a time window — protecting services from abuse, accidental loops, and DDoS.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    Rate limiting caps requests per client, user, IP, or API key within a time window — protecting services from abuse, accidental loops, and DDoS. Algorithms: fixed window, sliding window log, token bucket, leaky bucket. Redis INCR + EXPIRE or dedicated libraries implement counters at gateway scale.

    HLD pairs rate limits with AuthN tiers (free vs premium quotas) and graceful 429 responses including Retry-After header. Different limits for read vs write endpoints.

    This lesson covers algorithm trade-offs, distributed counting, and bypass paths for health checks.

    Understanding the topic

    Key concepts

    • Fixed window: 100 req/min per user — simple, boundary burst at window edge.
    • Sliding window: smoother limit using rolling time buckets in Redis.
    • Token bucket: allows bursts up to bucket size, steady refill rate.
    • Leaky bucket: smooth output rate — shapes traffic.
    • Global vs per-endpoint limits — login stricter than public catalog.
    • 429 Too Many Requests + Retry-After seconds.
    text
    flowchart LR
    Client --> GW[Gateway]
    GW --> RL[Rate Limiter Redis]
    RL -->|under limit| API
    RL -->|429| Client

    Internal architecture

    Architecture overview

    text
    flowchart LR
    Client --> GW[Gateway]
    GW --> RL[Rate Limiter Redis]
    RL -->|under limit| API
    RL -->|429| Client

    Step-by-step explanation

    1. API Gateway Redis rate limiter key = userId or API key.
    2. Free tier 100 rpm; premium 10k rpm — from JWT plan claim.
    3. Separate limiter for expensive endpoints (/search, /export).
    4. IP limit for unauthenticated endpoints anti-scraping.
    5. Whitelist internal service CIDR bypass with mTLS identity still.
    6. Alert on sustained 429 rate — product or attack signal.

    Informative example

    Redis sliding window rate limiter used in Spring filter:

    java
    @Component
    public class RateLimitFilter extends OncePerRequestFilter {
    private final StringRedisTemplate redis;
    private static final int LIMIT = 100;
    private static final Duration WINDOW = Duration.ofMinutes(1);
    public RateLimitFilter(StringRedisTemplate redis) { this.redis = redis; }
    @Override
    protected void doFilterInternal(HttpServletRequest req, HttpServletResponse res,
    FilterChain chain) throws ServletException, IOException {
    String key = "rl:" + resolveClientKey(req);
    long now = System.currentTimeMillis();
    String zkey = key + ":z";
    redis.opsForZSet().removeRangeByScore(zkey, 0, now - WINDOW.toMillis());
    Long count = redis.opsForZSet().zCard(zkey);
    if (count != null && count >= LIMIT) {
    res.setStatus(429);
    res.setHeader("Retry-After", "60");
    return;
    }
    redis.opsForZSet().add(zkey, UUID.randomUUID().toString(), now);
    redis.expire(zkey, WINDOW);
    chain.doFilter(req, res);
    }
    }

    Gateway-level limiting protects all services. Use token bucket for burst-friendly mobile clients.

    Real-world use

    Real-world use cases

    • Public fintech API tiered pricing by request quota.
    • Login endpoint brute-force protection 5 attempts/min/IP.
    • Social posting limits anti-spam.
    • Partner webhook delivery throttle outbound.

    Best practices

    • Return clear 429 with Retry-After.
    • Key limiter by authenticated userId when possible — not shared NAT IP.
    • Different limits read vs write vs auth.
    • Monitor limit hit rate per tier.
    • Fail open vs closed decision documented — usually open for availability with alert.
    • Combine with WAF and CAPTCHA on abuse patterns.

    Common mistakes

    • Rate limit only by IP — punishes corporate NAT users.
    • No bypass for health checks blocked during incident.
    • Fixed window without explaining edge burst double traffic.
    • Limiting after expensive work done — check early in filter chain.
    • Same limit globally for lightweight and heavy endpoints.

    Advanced interview questions

    Q1BeginnerWhy rate limit APIs?
    Prevent abuse, protect downstream resources, ensure fair usage across tenants.
    Q2BeginnerToken bucket vs fixed window?
    Token bucket allows controlled bursts; fixed window simpler but can allow 2× burst at boundary.
    Q3IntermediateWhere implement rate limiting?
    API Gateway edge primarily; service-level for expensive operations second line.
    Q4IntermediateDistributed rate limiting challenge?
    Counters must be shared (Redis) across gateway instances — local in-memory counts fail.
    Q5AdvancedDesign limits for public maps API.
    API key tiers 1k/10k/100k qps, token bucket burst 2×, 429 + Retry-After, Redis cluster counter, per-endpoint geocode stricter, anomaly detection block.

    Summary

    Rate limiting protects services from overload and abuse. Redis-backed counters enable distributed gateway limits. Token bucket balances steady rate with burst tolerance. Tier limits by plan; stricter on auth and write endpoints. 429 responses should include Retry-After guidance. Monitoring and logging complete the reliability picture.

    Ready to mark this lesson complete?Track your journey across the entire course.