TY - JOUR

T1 - Are Bitvectors Optimal?

AU - Buhrman, Harry

AU - Miltersen, Peter Bro

AU - Radhakrishnan, Jaikumar

AU - Venkatesh, Srinivasan

PY - 2002

Y1 - 2002

N2 - We study the it static membership problem: Given a set S of at most n keys drawn from a universe U of size m, store it so that queries of the form "Is u in S?" can be answered by making few accesses to the memory. We study schemes for this problem that use space close to the information theoretic lower bound of $\Omega(n\log(\frac{m}{n}))$ bits and yet answer queries by reading a small number of bits of the memory.We show that, for $\epsilon > 0$, there is a scheme that stores $O(\frac{n}{\epsilon^2}\log m)$ bits and answers membership queries using a randomized algorithm that reads just one bit of memory and errs with probability at most $\epsilon$. We consider schemes that make no error for queries in S but are allowed to err with probability at most $\epsilon$ for queries not in S. We show that there exist such schemes that store $O((\frac{n}{\epsilon})^2 \log m)$ bits and answer queries using just one bitprobe. If multiple probes are allowed, then the number of bits stored can be reduced to $O(n^{1+\delta}\log m)$ for any $\delta > 0$. The schemes mentioned above are based on probabilistic constructions of set systems with small intersections.We show lower bounds that come close to our upper bounds (for a large range of n and $\epsilon$): Schemes that answer queries with just one bitprobe and error probability $\epsilon$ must use $\Omega(\frac{n}{\epsilon\log(1/\epsilon)} \log m)$ bits of storage; if the error is restricted to queries not in S, then the scheme must use $\Omega(\frac{n^2}{\epsilon^2 \log (n/\epsilon)}\log m)$ bits of storage. We also consider deterministic schemes for the static membership problem and show tradeoffs between space and the number of probes.

AB - We study the it static membership problem: Given a set S of at most n keys drawn from a universe U of size m, store it so that queries of the form "Is u in S?" can be answered by making few accesses to the memory. We study schemes for this problem that use space close to the information theoretic lower bound of $\Omega(n\log(\frac{m}{n}))$ bits and yet answer queries by reading a small number of bits of the memory.We show that, for $\epsilon > 0$, there is a scheme that stores $O(\frac{n}{\epsilon^2}\log m)$ bits and answers membership queries using a randomized algorithm that reads just one bit of memory and errs with probability at most $\epsilon$. We consider schemes that make no error for queries in S but are allowed to err with probability at most $\epsilon$ for queries not in S. We show that there exist such schemes that store $O((\frac{n}{\epsilon})^2 \log m)$ bits and answer queries using just one bitprobe. If multiple probes are allowed, then the number of bits stored can be reduced to $O(n^{1+\delta}\log m)$ for any $\delta > 0$. The schemes mentioned above are based on probabilistic constructions of set systems with small intersections.We show lower bounds that come close to our upper bounds (for a large range of n and $\epsilon$): Schemes that answer queries with just one bitprobe and error probability $\epsilon$ must use $\Omega(\frac{n}{\epsilon\log(1/\epsilon)} \log m)$ bits of storage; if the error is restricted to queries not in S, then the scheme must use $\Omega(\frac{n^2}{\epsilon^2 \log (n/\epsilon)}\log m)$ bits of storage. We also consider deterministic schemes for the static membership problem and show tradeoffs between space and the number of probes.

KW - bit probe model

KW - data structures

KW - set membership problem

KW - set systems

U2 - 10.1137/S0097539702405292

DO - 10.1137/S0097539702405292

M3 - Journal article

SN - 0097-5397

VL - 31

SP - 1723

EP - 1744

JO - S I A M Journal on Computing

JF - S I A M Journal on Computing

IS - 6

ER -