Performance Optimization
Go is fast by default, but production services need deliberate optimization: pprof profiling, reducing allocations, connection pooling, and caching.
Introduction
Go is fast by default, but production services need deliberate optimization: pprof profiling, reducing allocations, connection pooling, and caching. Premature optimization wastes time — profile first, optimize hot paths proven by data.
Tools: go tool pprof for CPU/heap, trace for goroutine analysis, benchstat for benchmarks. sync.Pool reuses allocated objects; escape analysis hints at stack vs heap allocation. Interviewers ask how you'd diagnose a slow API under load.
This lesson demonstrates pprof endpoints, allocation reduction, and connection pool tuning — skills expected at mid-level Go backend roles.
The story
Google's Bigtable client and Cloudflare's edge proxies optimize hot paths with sync.Pool to reuse byte buffers, avoid allocations in tight loops, and profile with pprof before optimizing. A payment service cut P99 latency 40% by pooling JSON encode buffers — not by rewriting in Rust, but by removing GC pressure identified in a CPU profile.
Measure first: go test -bench, net/http/pprof, and execution traces reveal where time and allocations actually go.
Understanding the topic
Key concepts
- net/http/pprof exposes /debug/pprof/ on admin port.
- CPU profile: go tool pprof http://localhost:6060/debug/pprof/profile.
- Heap profile finds allocation hotspots.
- sync.Pool reuses temporary objects reducing GC pressure.
- GOMAXPROCS defaults to NumCPU — rarely change.
- Compiler inlines small functions; escape analysis in -gcflags=-m.
Step-by-step explanation
- Reproduce slowness under load (hey, k6, vegeta).
- Capture CPU profile during load test.
- Identify top functions in pprof flame graph.
- Reduce allocations in hot loop — biggest GC win.
- Tune DB pool, HTTP client pool, cache hot reads.
- Re-benchmark to verify improvement.
Practical code example
pprof server and sync.Pool for buffer reuse:
package mainimport ("fmt""net/http"_ "net/http/pprof""sync")var bufPool = sync.Pool{New: func() any {b := make([]byte, 0, 4096)return &b},}func handleRequest(w http.ResponseWriter, r *http.Request) {bufPtr := bufPool.Get().(*[]byte)buf := (*bufPtr)[:0]defer func() {*bufPtr = buf[:0]bufPool.Put(bufPtr)}()for i := 0; i < 100; i++ {buf = append(buf, byte(i%26+'a')...)}w.Write(buf)}func main() {go func() {fmt.Println("pprof on :6060")http.ListenAndServe(":6060", nil)}()http.HandleFunc("/data", handleRequest)fmt.Println("api on :8080")http.ListenAndServe(":8080", nil)}
Line-by-line code explanation
sync.Poolreuses temporary objects — reset state in Get, return clean objects in Put.bytes.Bufferwith pre-allocation viaGrow(n)reduces slice reallocation in serializers.import _ "net/http/pprof"exposes profiling endpoints on the debug server.go tool pprof cpu.profanalyzes CPU profiles to find hot functions.avoid string + concatenation in loops— usestrings.Builderinstead.preallocate slices—make([]T, 0, expectedLen)when size is known.GOGCtunes garbage collector aggressiveness — lower values trade CPU for lower memory.compiler inlining— small functions inline automatically; profile before manual micro-optimization.
Key takeaway: Import net/http/pprof registers handlers on DefaultServeMux. sync.Pool reduces GC from repeated buffer allocation. Profile before optimizing.
Real-world use
Where you'll use this in production
- API latency reduction under 10K RPS load.
- Memory leak diagnosis in long-running workers.
- JSON serialization optimization in hot handlers.
- Database query optimization after pprof shows wait time.
Best practices
- Profile production-like load, not idle servers.
- Reduce allocations before micro-tuning CPU.
- Cache immutable read-heavy data with TTL.
- Use connection pools — DB and HTTP.
- Separate admin/pprof port from public API.
- Document baseline benchmarks in repo.
Common mistakes
- Optimizing without profiling data.
- sync.Pool storing objects with lingering references — memory leak.
- Leaving pprof on public port — security risk.
- Ignoring database as bottleneck — optimizing JSON while SQL slow.
- Setting GOMAXPROCS incorrectly on container with CPU limits.
Advanced interview questions
Q1BeginnerGo profiling tools?
Q2Beginnersync.Pool purpose?
Q3IntermediateDiagnose high memory usage?
Q4IntermediateOptimize handler doing JSON + DB?
Q5AdvancedAPI 500ms p99 — investigation steps?
Summary
Profile with pprof before optimizing — data drives decisions. Reduce allocations in hot paths for GC improvements. sync.Pool reuses buffers; tune connection pools. Keep pprof on admin port only; secure in production. Next lesson: security best practices.