HyperLogLog: The Analysis of a near-Optimal Cardinality Estimation Algorithm¶
Authors: Philippe Flajolet, Eric Fusy, Olivier Gandouet, Frederic Meunier
Published: 2007 (Conference Paper)
Source: Conference on Analysis of Algorithms (AofA)
Algorithm: HyperLogLog
DOI: 10.46298/dmtcs.3545
Summary¶
Derives and analyzes HyperLogLog, the now-standard streaming sketch for approximate distinct counting. Its key contribution is a memory-efficient estimator with predictable relative error that composes naturally across distributed streams.
Abstract¶
This extended abstract describes and analyses a near-optimal probabilistic algorithm, HyperLogLog, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, short bytes), HyperLogLog performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04/sqrt(m). This improves on the best previously known cardinality estimator, LogLog, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 10^9 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.
Links¶
Primary
Standard
Tags¶
-
Cardinality estimation
-
Streaming algorithms
-
Probabilistic data structures
-
HyperLogLog
-
Sketching
-
Hashing