Go (Golang) Tutorial 0/45 lessons ~6 min read Lesson 36“Goroutines, APIs & cloud-native backends”

Benchmark Testing

Q: b.N meaning?

Iteration count auto-tuned for reliable timing measurement.

Q: ReportAllocs purpose?

Shows bytes and allocation count per operation — GC pressure indicator.

Q: pprof from benchmarks?

go test -bench=. -cpuprofile=cpu.out; go tool pprof cpu.out.

Q: Optimize JSON API handler latency.

pprof CPU+heap; benchmark marshal; pool buffers; reduce allocations; cache read-heavy data.

Benchmarks measure performance with func BenchmarkXxx(b *testing.B).

Course progress0%

Focus

10 guided sections

Practice signal

Examples included

Career prep

Interview Q&A included

Introduction

Benchmarks measure performance with func BenchmarkXxx(b *testing.B). Go runs benchmarks with increasing iteration counts until stable timing. Compare implementations with benchstat, profile with pprof, and detect allocations with ReportAllocs.

Benchmarks matter in Go interviews for high-throughput services — JSON serialization, cache lookups, and string building are common targets. Understanding ns/op, B/op (bytes allocated), and allocs/op metrics guides optimization decisions.

This lesson benchmarks string concatenation vs strings.Builder and demonstrates memory allocation reporting — skills that differentiate mid-level backend engineers.

The story

A Cloudflare JSON serialization hot path processes millions of edge config updates per minute. Before switching marshalers, engineers write benchmarks: go test -bench=. compares encoding/json vs alternatives, reporting ns/op and allocations/op — data that drives decisions worth millions in CPU savings across global infrastructure.

Benchmarks prevent premature optimization but validate it: profile first, benchmark the change, deploy with confidence.

Understanding the topic

Key concepts

func BenchmarkName(b *testing.B) in *_test.go.
b.N iterations adjusted automatically by framework.
b.ResetTimer() excludes setup from measurement.
b.ReportAllocs() shows allocations per operation.
go test -bench=. -benchmem runs benchmarks.
benchstat compares two benchmark runs statistically.

Step-by-step explanation

Benchmark function loops b.N times calling code under test.
Setup before loop; b.ResetTimer() before measured section.
go test -bench=BenchmarkFoo -count=5 for stable results.
pprof: go test -bench=. -cpuprofile=cpu.out.
Compare before/after optimization with benchstat.
Benchmark parallel with b.RunParallel for concurrent code.

Practical code example

Benchmark comparing string concat vs Builder with allocation reporting:

package main

import (
	"strings"
	"testing"
)

func concatLoop(n int) string {
	s := ""
	for i := 0; i < n; i++ {
		s += "x"
	}
	return s
}

func builderLoop(n int) string {
	var b strings.Builder
	b.Grow(n)
	for i := 0; i < n; i++ {
		b.WriteByte('x')
	}
	return b.String()
}

func BenchmarkConcat(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		concatLoop(100)
	}
}

func BenchmarkBuilder(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		builderLoop(100)
	}
}

Output

BenchmarkMarshal-8   500000   2847 ns/op   512 B/op   4 allocs/op

Line-by-line code explanation

func BenchmarkMarshal(b *testing.B) — benchmark functions start with Benchmark.
b.ResetTimer() excludes setup time from measured iterations.
for i := 0; i < b.N; i++ — the framework sets b.N to stabilize timing.
b.ReportAllocs() tracks allocations per operation alongside ns/op.
go test -bench=. -benchmem ./pkg/codec runs benchmarks with memory stats.
b.RunParallel(func(pb *testing.PB)) benchmarks concurrent code paths.
compare with benchstat — run twice and use golang.org/x/perf/cmd/benchstat for significance.
avoid compiler elimination — assign results to package-level sink variables.

Key takeaway: Run: go test -bench=. -benchmem. Builder shows fewer allocs/op. b.Grow preallocates capacity.

Real-world use

Where you'll use this in production

Optimizing JSON marshal hot paths in APIs.
Comparing cache implementations before production.
Validating regex vs string ops for log parsing.
CI regression detection on critical path latency.

Best practices

Benchmark before optimizing — profile proves bottleneck.
Use b.ReportAllocs() for allocation-sensitive code.
Run with -count=5 and compare with benchstat.
Reset timer after expensive setup.
Benchmark realistic input sizes from production metrics.
Don't micro-optimize without benchmark evidence.

Common mistakes

Benchmarking debug builds — use consistent flags.
Including setup in timed section without ResetTimer.
Single run conclusions — high variance misleads.
Optimizing cold paths ignoring pprof top functions.
Benchmarking on laptop while CI runs on different CPU.

Advanced interview questions

Q1BeginnerRun benchmarks?

go test -bench=. -benchmem ./...

Q2Beginnerb.N meaning?

Iteration count auto-tuned for reliable timing measurement.

Q3IntermediateReportAllocs purpose?

Shows bytes and allocation count per operation — GC pressure indicator.

Q4Intermediatepprof from benchmarks?

go test -bench=. -cpuprofile=cpu.out; go tool pprof cpu.out.

Q5AdvancedOptimize JSON API handler latency.

pprof CPU+heap; benchmark marshal; pool buffers; reduce allocations; cache read-heavy data.

Summary

Benchmarks measure ns/op and allocations with testing.B. Always benchmark before optimizing; use pprof for proof. strings.Builder beats += in loops — verify with benchmem. benchstat compares runs; -race separate from benchmarks. Next lesson: structured logging with slog.

Ready to mark this lesson complete?Track your journey across the entire course.