Research output: Contribution to book/anthology/report/proceeding › Article in proceedings › Research › peer-review

- https://doi.org/10.1137/1.9781611975994.36
Final published version

- Peyman Afshani
- Ingo van Duijn, Aalborg University Hospital ,
- Rasmus Killmann
- Jesper Sindahl Nielsen

In this paper we study lower bounds for a variant of jumbled-indexing problem: given an input string S on an alphabet Σ = {σ_{1}, . . ., σ_{λ}}, store it in a data structure such that given frequencies f_{1}, · · ·, f_{λ}, one can find all the substrings S^{0} of S where the frequency of the character σ_{i} is f_{i}, for 1 ≤ i ≤ λ. This is a very interesting and challenging text indexing problem and it has received significant attention lately, both in the string-indexing community as well as in the theory community. It is known that when λ = 2, one can use a linear-size data structure to decide whether there exists a substring that matches the query in constant time [10]. It is also known [1] that for constant λ ≥ 3, then there exists a constant α_{λ} such that it is not possible to achieve both O(n^{2−}α^{λ}) preprocessing time and O(n^{1−}α^{λ}) query time. We study a variant of the problem where the goal is to report all the substrings of S that match a given jumbled-indexing query. Assuming the data structure operates in the pointer machine model, we prove unconditional space lower bounds that also apply to binary alphabets: we show that if the data structure has the query time of O(n^{0.5−}o^{(1)} + k), where k is the output size of the query, then it must consume Ω(n^{2−}o^{(1)}) space. The o(1) term in our lower bound is at most _{log(10)}^{1}_{n}, where log^{(}·^{)} notation denotes the iterative log function (i.e., log^{(10)} n = log log log log log log log log log log n). This result has a number of interesting consequences. First, it shows that reporting all the matches is significantly harder than deciding whether a match exists, at least for the binary alphabet. This was surprising for us since we believed that in order for jumbled-indexing to be difficult, the alphabet size must be at least 3. Second, from a technical point of view, we make connections between this problem and the area of additive combinatorics as well as random walks. Previously, Chan and Lewenstein [6] had shown the connections between this problem and the area of additive combinatorics from an upper bound point of view, however, we use fundamental theorems from additive combinatorics but these tools are quite different from the theorems used in [6]. In our opinion, the fact that additive combinatorics appears in our lower bound approach as well as in the upper bound approach of Chan and Lewenstein [6] is noteworthy and demands further investigation. We believe our results open up a few new venues for research. For example, we believe it is interesting to study whether it is possible to build a data structure that uses O(n) space s.t., it can report all the matches to a jumble-indexing query in O(n^{2}/^{3} + k) time; this is even interesting for a binary alphabet.

Original language | English |
---|---|

Title of host publication | Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms |

Editors | Shuchi Chawla |

Number of pages | 15 |

Publisher | Association for Computing Machinery |

Publication year | 2020 |

Pages | 592-606 |

ISBN (Electronic) | 9781611975994 |

DOIs | |

Publication status | Published - 2020 |

Event | 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 - Salt Lake City, United States Duration: 5 Jan 2020 → 8 Jan 2020 |

Conference | 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 |
---|---|

Land | United States |

By | Salt Lake City |

Periode | 05/01/2020 → 08/01/2020 |

Sponsor | ACM Special Interest Group on Algorithms and Computation Theory (SIGACT), SIAM Activity Group on Discrete Mathematics |

See relations at Aarhus University Citationformats

ID: 187165154