HyperLogLog
HyperLogLog (HLL) estimates unique cardinality in fixed ~12KB with ~0.81% standard error — PFADD and PFCOUNT replace SET-based UV tracking that would consume gigabytes.
Introduction
HyperLogLog (HLL) estimates unique cardinality in fixed ~12KB with ~0.81% standard error — PFADD and PFCOUNT replace SET-based UV tracking that would consume gigabytes.
Amazon product page analytics and home-screen UV counts use HLL when exact uniques are unnecessary but memory must stay bounded. PFMERGE combines daily sketches into weekly totals.
Trade accuracy for space — never use HLL when you need exact membership or per-user iteration.
Understanding the topic
Key concepts
- PFADD key element — add to sketch; duplicates ignored statistically.
- PFCOUNT key — estimated distinct count.
- PFMERGE dest src1 src2 — union of sketches.
- Fixed memory ~12KB per key regardless of billions of elements.
- Error ~0.81% — not suitable for billing.
- Elements are strings — IPs, userIds, session tokens.
Step-by-step explanation
- HLL uses probabilistic registers — hash each element to bucket.
- Maximum register values estimate cardinality via harmonic mean.
- PFADD O(1); PFCOUNT O(N) on register count (tiny).
- Merge takes max per register — union estimate.
- Cannot list members — count only.
Syntax reference
Common commands
- PFADD idempotent statistically — safe to retry.
- PFCOUNT on empty key returns 0.
- PFMERGE — weekly rollup from daily keys.
PFADD uv:home:2026-07-01 ip:1.2.3.4 user:42PFCOUNT uv:home:2026-07-01PFMERGE uv:home:week26 uv:day1 uv:day2 uv:day3
Informative example
Track unique visitors per page without storing every IP:
# Each page viewPFADD uv:product:sku42 192.168.1.1PFADD uv:product:sku42 10.0.0.5# End of day dashboardPFCOUNT uv:product:sku42# (integer) 2 — approximate if scale is huge
For exact small sets use SET. Switch to HLL when cardinality exceeds ~100K per key.
Real-world use
Real-world use cases
- Unique visitors per article/product page.
- Unique search queries per hour.
- CDN edge unique client IPs (sampled).
- Ad impression reach estimation.
- Weekly UV rollup via PFMERGE.
Best practices
- Key per day/page: uv:{page}:{date}.
- Set TTL on daily sketches after merge.
- Document ~1% error in dashboards.
- Don't use for financial unique counts.
- Combine with SET for small beta features.
- Monitor key count — each HLL is cheap but unbounded keys aren't.
Common mistakes
- Expecting exact counts for billing.
- Trying to enumerate HLL members.
- Using HLL for sets under 10K — SET is fine.
- No TTL on daily uv keys.
Advanced interview questions
Q1BeginnerWhat is HyperLogLog?
Q2BeginnerHLL error rate?
Q3IntermediatePFADD vs SADD for UV?
Q4IntermediateCan you delete one element from HLL?
Q5AdvancedBill by unique API callers?
Summary
HLL = approximate unique count in ~12KB.