Go (Golang) Tutorial 0/45 lessons ~6 min read Lesson 42

    Performance Optimization

    Go is fast by default, but production services need deliberate optimization: pprof profiling, reducing allocations, connection pooling, and caching.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    Go is fast by default, but production services need deliberate optimization: pprof profiling, reducing allocations, connection pooling, and caching. Premature optimization wastes time — profile first, optimize hot paths proven by data.

    Tools: go tool pprof for CPU/heap, trace for goroutine analysis, benchstat for benchmarks. sync.Pool reuses allocated objects; escape analysis hints at stack vs heap allocation. Interviewers ask how you'd diagnose a slow API under load.

    This lesson demonstrates pprof endpoints, allocation reduction, and connection pool tuning — skills expected at mid-level Go backend roles.

    The story

    Google's Bigtable client and Cloudflare's edge proxies optimize hot paths with sync.Pool to reuse byte buffers, avoid allocations in tight loops, and profile with pprof before optimizing. A payment service cut P99 latency 40% by pooling JSON encode buffers — not by rewriting in Rust, but by removing GC pressure identified in a CPU profile.

    Measure first: go test -bench, net/http/pprof, and execution traces reveal where time and allocations actually go.

    Understanding the topic

    Key concepts

    • net/http/pprof exposes /debug/pprof/ on admin port.
    • CPU profile: go tool pprof http://localhost:6060/debug/pprof/profile.
    • Heap profile finds allocation hotspots.
    • sync.Pool reuses temporary objects reducing GC pressure.
    • GOMAXPROCS defaults to NumCPU — rarely change.
    • Compiler inlines small functions; escape analysis in -gcflags=-m.

    Step-by-step explanation

    1. Reproduce slowness under load (hey, k6, vegeta).
    2. Capture CPU profile during load test.
    3. Identify top functions in pprof flame graph.
    4. Reduce allocations in hot loop — biggest GC win.
    5. Tune DB pool, HTTP client pool, cache hot reads.
    6. Re-benchmark to verify improvement.

    Practical code example

    pprof server and sync.Pool for buffer reuse:

    go
    package main
    import (
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "sync"
    )
    var bufPool = sync.Pool{
    New: func() any {
    b := make([]byte, 0, 4096)
    return &b
    },
    }
    func handleRequest(w http.ResponseWriter, r *http.Request) {
    bufPtr := bufPool.Get().(*[]byte)
    buf := (*bufPtr)[:0]
    defer func() {
    *bufPtr = buf[:0]
    bufPool.Put(bufPtr)
    }()
    for i := 0; i < 100; i++ {
    buf = append(buf, byte(i%26+'a')...)
    }
    w.Write(buf)
    }
    func main() {
    go func() {
    fmt.Println("pprof on :6060")
    http.ListenAndServe(":6060", nil)
    }()
    http.HandleFunc("/data", handleRequest)
    fmt.Println("api on :8080")
    http.ListenAndServe(":8080", nil)
    }

    Line-by-line code explanation

    • sync.Pool reuses temporary objects — reset state in Get, return clean objects in Put.
    • bytes.Buffer with pre-allocation via Grow(n) reduces slice reallocation in serializers.
    • import _ "net/http/pprof" exposes profiling endpoints on the debug server.
    • go tool pprof cpu.prof analyzes CPU profiles to find hot functions.
    • avoid string + concatenation in loops — use strings.Builder instead.
    • preallocate slicesmake([]T, 0, expectedLen) when size is known.
    • GOGC tunes garbage collector aggressiveness — lower values trade CPU for lower memory.
    • compiler inlining — small functions inline automatically; profile before manual micro-optimization.

    Key takeaway: Import net/http/pprof registers handlers on DefaultServeMux. sync.Pool reduces GC from repeated buffer allocation. Profile before optimizing.

    Real-world use

    Where you'll use this in production

    • API latency reduction under 10K RPS load.
    • Memory leak diagnosis in long-running workers.
    • JSON serialization optimization in hot handlers.
    • Database query optimization after pprof shows wait time.

    Best practices

    • Profile production-like load, not idle servers.
    • Reduce allocations before micro-tuning CPU.
    • Cache immutable read-heavy data with TTL.
    • Use connection pools — DB and HTTP.
    • Separate admin/pprof port from public API.
    • Document baseline benchmarks in repo.

    Common mistakes

    • Optimizing without profiling data.
    • sync.Pool storing objects with lingering references — memory leak.
    • Leaving pprof on public port — security risk.
    • Ignoring database as bottleneck — optimizing JSON while SQL slow.
    • Setting GOMAXPROCS incorrectly on container with CPU limits.

    Advanced interview questions

    Q1BeginnerGo profiling tools?
    pprof CPU/heap, trace, go test -bench, runtime/metrics.
    Q2Beginnersync.Pool purpose?
    Reuse temporary objects between GC cycles — reduces allocation rate.
    Q3IntermediateDiagnose high memory usage?
    heap pprof; look for retained objects; check goroutine leaks.
    Q4IntermediateOptimize handler doing JSON + DB?
    Profile both; pool buffers; prepared statements; cache reads; index slow queries.
    Q5AdvancedAPI 500ms p99 — investigation steps?
    Reproduce load; CPU+trace profile; check DB slow log; network latency; fix top pprof frame; re-measure.

    Summary

    Profile with pprof before optimizing — data drives decisions. Reduce allocations in hot paths for GC improvements. sync.Pool reuses buffers; tune connection pools. Keep pprof on admin port only; secure in production. Next lesson: security best practices.

    Ready to mark this lesson complete?Track your journey across the entire course.