Publikation: Working paper › Forskning

- David P. Woodruff ,
- Qin Zhang

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008). In this model there are $k$ sites each tracking their input and communicating with a central coordinator that continuously maintain an approximate output to a function $f$ computed over the union of the inputs. The goal is to minimize the communication.

We show the randomized communication complexity of estimating the number of distinct elements up to a $1+\eps$ factor is $\Omega(k/\eps^2)$, improving the previous $\Omega(k + 1/\eps^2)$ bound and matching known upper bounds. For the $p$-th frequency moment $F_p$, $p > 1$, we improve the previous $\Omega(k + 1/\eps^2)$ communication bound to $\tilde{\Omega}(k^{p-1}/\eps^2)$. We obtain similar improvements for heavy hitters, empirical entropy, and other problems. We also show that we can estimate $F_p$, for any $p > 1$, using $\tilde{O}(k^{p-1}\poly(\eps^{-1}))$ communication. This drastically improves upon the previous $\tilde{O}(k^{2p+1}N^{1-2/p} \poly(\eps^{-1}))$ bound of Cormode, Muthukrishnan, and Yi for general $p$, and their $\tilde{O}(k^2/\eps + k^{1.5}/\eps^3)$ bound for $p = 2$. For $p = 2$, our bound resolves their main open question.

Our lower bounds are based on new direct sum theorems for approximate majority, and yield significant improvements to problems in the data stream model, improving the bound for estimating $F_p, p > 2,$ in $t$ passes from $\tilde{\Omega}(n^{1-2/p}/(\eps^{2/p} t))$ to $\tilde{\Omega}(n^{1-2/p}/(\eps^{4/p} t))$, giving the first bound for estimating $F_0$ in $t$ passes of $\Omega(1/(\eps^2 t))$ bits of space that does not use the gap-hamming problem, and showing a distribution for the gap-hamming problem with high external information cost or super-polynomial communication, partly answering Question 25 in the Open Problems in Data Streams list.

We show the randomized communication complexity of estimating the number of distinct elements up to a $1+\eps$ factor is $\Omega(k/\eps^2)$, improving the previous $\Omega(k + 1/\eps^2)$ bound and matching known upper bounds. For the $p$-th frequency moment $F_p$, $p > 1$, we improve the previous $\Omega(k + 1/\eps^2)$ communication bound to $\tilde{\Omega}(k^{p-1}/\eps^2)$. We obtain similar improvements for heavy hitters, empirical entropy, and other problems. We also show that we can estimate $F_p$, for any $p > 1$, using $\tilde{O}(k^{p-1}\poly(\eps^{-1}))$ communication. This drastically improves upon the previous $\tilde{O}(k^{2p+1}N^{1-2/p} \poly(\eps^{-1}))$ bound of Cormode, Muthukrishnan, and Yi for general $p$, and their $\tilde{O}(k^2/\eps + k^{1.5}/\eps^3)$ bound for $p = 2$. For $p = 2$, our bound resolves their main open question.

Our lower bounds are based on new direct sum theorems for approximate majority, and yield significant improvements to problems in the data stream model, improving the bound for estimating $F_p, p > 2,$ in $t$ passes from $\tilde{\Omega}(n^{1-2/p}/(\eps^{2/p} t))$ to $\tilde{\Omega}(n^{1-2/p}/(\eps^{4/p} t))$, giving the first bound for estimating $F_0$ in $t$ passes of $\Omega(1/(\eps^2 t))$ bits of space that does not use the gap-hamming problem, and showing a distribution for the gap-hamming problem with high external information cost or super-polynomial communication, partly answering Question 25 in the Open Problems in Data Streams list.

Originalsprog | Engelsk |
---|---|

Antal sider | 40 |

Status | Udgivet - 2011 |

Se relationer på Aarhus Universitet Citationsformater

ID: 44544739