Horizontal vs Vertical Scaling
Scaling increases system capacity to meet load.
Introduction
Scaling increases system capacity to meet load. Vertical scaling (scale up) adds CPU, RAM, or faster disks to existing machines. Horizontal scaling (scale out) adds more machines and distributes work across them. Most cloud-native HLD solutions scale out; vertical scaling remains useful for databases and quick wins.
Interviewers expect you to pick the right lever per component: stateless API servers scale out easily; relational primary databases scale up first, then shard. Understanding limits of each approach prevents designs that assume infinite linear scale.
This lesson compares trade-offs, auto-scaling patterns, and how scaling choices interact with load balancers and data tiers.
Understanding the topic
Key concepts
- Vertical: simpler ops, no partition issues, hard ceiling (largest instance), downtime during resize.
- Horizontal: near-linear capacity for stateless tiers, requires load distribution, fault tolerant.
- Stateless services: store session in Redis; any instance handles any request.
- Stateful tiers: databases, Kafka brokers need careful horizontal strategies (sharding, RF).
- Auto-scaling: HPA on CPU/RPS/custom metrics in Kubernetes.
- Diminishing returns: Amdahl's law — serial portions cap speedup.
flowchart LRsubgraph VerticalVM1[More CPU RAM]endsubgraph HorizontalVM2 --> VM3 --> VM4end
Internal architecture
Architecture overview
flowchart LRsubgraph VerticalVM1[More CPU RAM]endsubgraph HorizontalVM2 --> VM3 --> VM4end
Step-by-step explanation
- Start vertical on DB until metrics show CPU/IO saturation near instance max.
- Add read replicas (horizontal read scale) before sharding writes.
- Scale API tier horizontally behind LB; minimum 2 AZ instances for HA.
- Use connection poolers (PgBouncer) when many app instances hit DB.
- Cache hot reads in Redis cluster (horizontal shard by key hash).
- Define scaling triggers: CPU > 70%, p99 latency SLO breach, queue depth threshold.
Informative example
Kubernetes HPA manifest scaling Spring Boot deployment on CPU and custom RPS metric:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata:name: catalog-api-hpaspec:scaleTargetRef:apiVersion: apps/v1kind: Deploymentname: catalog-apiminReplicas: 3maxReplicas: 40metrics:- type: Resourceresource:name: cputarget:type: UtilizationaverageUtilization: 65- type: Podspods:metric:name: http_requests_per_secondtarget:type: AverageValueaverageValue: "800"behavior:scaleUp:stabilizationWindowSeconds: 60scaleDown:stabilizationWindowSeconds: 300
State DB scaling separately — HPA on apps does not fix saturated PostgreSQL primary. Mention read replicas and connection limits.
Real-world use
Real-world use cases
- OTT video origin: horizontal edge caches; vertical transcode workers with GPUs.
- E-commerce flash sale: horizontal pod burst + queue absorption.
- Banking batch jobs: vertical high-memory nodes for overnight reconciliation.
- Social feed: horizontal stateless feed generators; sharded Cassandra for writes.
Best practices
- Make app tier stateless before scaling out.
- Load test to find knee in curve — don't over-provision blindly.
- Scale down slowly to avoid flapping during traffic dips.
- Monitor DB connections per scale-out event.
- Use multi-AZ horizontal spread for availability, not just capacity.
- Document max shard/instance limits in capacity estimates.
Common mistakes
- Horizontal scaling sticky-session apps without shared session store.
- Adding 100 app pods while DB remains single small instance.
- Ignoring cold start time — new pods slow during scale-up if JVM warmup heavy.
- Assuming linear scale with synchronized writes to one primary DB.
- Vertical scaling production DB during peak without replica failover plan.
Advanced interview questions
Q1BeginnerDifference between horizontal and vertical scaling?
Q2BeginnerWhich tier scales horizontally easiest?
Q3IntermediateWhy databases resist horizontal write scaling?
Q4IntermediateWhat triggers auto-scaling in Kubernetes?
Q5AdvancedScale design for 10× traffic spike in 5 minutes?
Summary
Scale out stateless tiers; scale up or shard stateful data stores. Auto-scaling ties infrastructure to SLOs and cost control. Connection pools and caches protect databases during horizontal app growth. Multi-AZ horizontal deployment improves availability and capacity. Capacity estimation informs when each lever activates. Load balancers distribute horizontal app instances — next lesson.