Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaper › Journal article › Research › peer-review
Submitted manuscript
Final published version
We develop a new algorithm for the turnstile heavy hitters problem in general turnstile streams, the EXPANDERSKETCH, which finds the approximate top-k items in a universe of size n using the same asymptotic O(k log n)words of memory and O(log n) update time as the COUNTMIN and COUNTSKETCH, but requiring only O(k poly(log n)) time to answer queries instead of the O(n log n) time of the other two. The notion of "approximation" is the same l(2) sense as the COUNTSKETCH, which given known lower bounds is the strongest guarantee one can achieve in sublinear memory.
Our main innovation is an efficient reduction from the heavy hitters problem to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a "cluster-preserving clustering" algorithm that partitions the graph into pieces while finding every cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel local search techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our clustering algorithm may be of broader interest beyond heavy hitters and streaming algorithms.
Original language | English |
---|---|
Journal | Communications of the ACM |
Volume | 62 |
Issue | 8 |
Pages (from-to) | 95-100 |
Number of pages | 6 |
ISSN | 0001-0782 |
DOIs | |
Publication status | Published - Aug 2019 |
See relations at Aarhus University Citationformats
ID: 161790678