Redis Tutorial 0/42 lessons ~6 min read Lesson 14

    HyperLogLog

    HyperLogLog (HLL) estimates unique cardinality in fixed ~12KB with ~0.81% standard error — PFADD and PFCOUNT replace SET-based UV tracking that would consume gigabytes.

    Course progress0%
    Focus
    10 guided sections
    Practice signal
    Examples included
    Career prep
    Interview Q&A included

    Introduction

    HyperLogLog (HLL) estimates unique cardinality in fixed ~12KB with ~0.81% standard error — PFADD and PFCOUNT replace SET-based UV tracking that would consume gigabytes.

    Amazon product page analytics and home-screen UV counts use HLL when exact uniques are unnecessary but memory must stay bounded. PFMERGE combines daily sketches into weekly totals.

    Trade accuracy for space — never use HLL when you need exact membership or per-user iteration.

    Understanding the topic

    Key concepts

    • PFADD key element — add to sketch; duplicates ignored statistically.
    • PFCOUNT key — estimated distinct count.
    • PFMERGE dest src1 src2 — union of sketches.
    • Fixed memory ~12KB per key regardless of billions of elements.
    • Error ~0.81% — not suitable for billing.
    • Elements are strings — IPs, userIds, session tokens.

    Step-by-step explanation

    1. HLL uses probabilistic registers — hash each element to bucket.
    2. Maximum register values estimate cardinality via harmonic mean.
    3. PFADD O(1); PFCOUNT O(N) on register count (tiny).
    4. Merge takes max per register — union estimate.
    5. Cannot list members — count only.

    Syntax reference

    Common commands

    • PFADD idempotent statistically — safe to retry.
    • PFCOUNT on empty key returns 0.
    • PFMERGE — weekly rollup from daily keys.
    bash
    PFADD uv:home:2026-07-01 ip:1.2.3.4 user:42
    PFCOUNT uv:home:2026-07-01
    PFMERGE uv:home:week26 uv:day1 uv:day2 uv:day3

    Informative example

    Track unique visitors per page without storing every IP:

    bash
    # Each page view
    PFADD uv:product:sku42 192.168.1.1
    PFADD uv:product:sku42 10.0.0.5
    # End of day dashboard
    PFCOUNT uv:product:sku42
    # (integer) 2 — approximate if scale is huge

    For exact small sets use SET. Switch to HLL when cardinality exceeds ~100K per key.

    Real-world use

    Real-world use cases

    • Unique visitors per article/product page.
    • Unique search queries per hour.
    • CDN edge unique client IPs (sampled).
    • Ad impression reach estimation.
    • Weekly UV rollup via PFMERGE.

    Best practices

    • Key per day/page: uv:{page}:{date}.
    • Set TTL on daily sketches after merge.
    • Document ~1% error in dashboards.
    • Don't use for financial unique counts.
    • Combine with SET for small beta features.
    • Monitor key count — each HLL is cheap but unbounded keys aren't.

    Common mistakes

    • Expecting exact counts for billing.
    • Trying to enumerate HLL members.
    • Using HLL for sets under 10K — SET is fine.
    • No TTL on daily uv keys.

    Advanced interview questions

    Q1BeginnerWhat is HyperLogLog?
    Probabilistic structure estimating unique count in ~12KB.
    Q2BeginnerHLL error rate?
    Standard error ~0.81%.
    Q3IntermediatePFADD vs SADD for UV?
    SADD exact O(N) memory; PFADD fixed space approximate.
    Q4IntermediateCan you delete one element from HLL?
    No — only PFADD; no per-element removal.
    Q5AdvancedBill by unique API callers?
    Use exact SET or DB — HLL error unacceptable for money.

    Summary

    HLL = approximate unique count in ~12KB.

    Ready to mark this lesson complete?Track your journey across the entire course.