Aarhus University Seal

Succinct Sampling from Discrete Distributions

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

We revisit the classic problem of sampling from a discrete distribution: Given n non-negative w-bit integers x_1,...,x_n, the task is to build a data structure that allows sampling i with probability proportional to x_i. The classic solution is Walker's alias method that takes, when implemented on a Word RAM, O(n) preprocessing time, O(1) expected query time for one sample, and n(w+2 lg n+o(1)) bits of space. Using the terminology of succinct data structures, this solution has redundancy 2n lg n+o(n) bits, i.e., it uses 2n lg n+o(n) bits in addition to the information theoretic minimum required for storing the input. In this paper, we study whether this space usage can be improved.
In the systematic case, in which the input is read-only, we present a novel data structure using r+O(w) redundant bits, O(n/r) expected query time and O(n) preprocessing time for any r. This is an improvement in redundancy by a factor of Omega(log n) over the alias method for r = n, even though the alias method is not systematic. Moreover, we complement our data structure with a lower bound showing that this trade-off is tight for systematic data structures.
In the non-systematic case, in which the input numbers may be represented in more clever ways than just storing them one-by-one, we demonstrate a very surprising separation from the systematic case: With only 1 redundant bit, it is possible to support optimal O(1) expected query time and O(n) preprocessing time!
On the one hand, our results improve upon the space requirement of the classic solution for a fundamental sampling problem, on the other hand, they provide the strongest known separation between the systematic and non-systematic case for any data structure problem. Finally, we also believe our upper bounds are practically efficient and simpler than Walker's alias method.
Original languageEnglish
Title of host publicationProceedings of the 45th annual ACM Symposium on Theory of Computing, STOC '13
Number of pages8
PublisherAssociation for Computing Machinery
Publication year2013
Pages775-782
ISBN (print)978-1-4503-2029-0
DOIs
Publication statusPublished - 2013
EventACM Symposium on the Theory of Computing - Palo Alto, Canada
Duration: 1 Jun 20134 Jun 2013
Conference number: 45

Conference

ConferenceACM Symposium on the Theory of Computing
Nummer45
LandCanada
ByPalo Alto
Periode01/06/201304/06/2013
SeriesA C M Symposium on the Theory of Computing. Annual Proceedings
ISSN0737-8017

See relations at Aarhus University Citationformats

ID: 56781380