Aarhus University Seal

CountSketches, Feature Hashing and the Median of Three

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

  • Kasper Green Larsen
  • Rasmus Pagh, University of Copenhagen
  • ,
  • Jakub Tetek, University of Copenhagen

In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector v to a vector of dimension (2t − 1)s, where t, s > 0 are integer parameters. It is known that a CountSketch allows estimating coordinates of v with variance bounded by kvk 2 2/s. For t > 1, the estimator takes the median of 2t − 1 independent estimates, and the probability that the estimate is off by more than 2kvk 2/√s is exponentially small in t. This suggests choosing t to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant t. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to O(min{kvk 2 1/s 2, kvk 2 2/s}) when t > 1. That is, the variance decreases proportionally to s −2, asymptotically for large enough s.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
EditorsMarina Meila, Tong Zhang
Number of pages10
Publication year2021
Pages6011-6020
ISBN (electronic)9781713845065
Publication statusPublished - 2021
EventInternational Conference on Machine Learning, 18-24 July 2021, Virtual -
Duration: 18 Jul 202124 Jul 2021

Conference

ConferenceInternational Conference on Machine Learning, 18-24 July 2021, Virtual
Periode18/07/202124/07/2021
SeriesProceedings of Machine Learning Research
Volume139

See relations at Aarhus University Citationformats

ID: 229267475