Aarhus University Seal

On hardness of several string indexing problems

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

  • Kasper Green Larsen
  • J. Ian Munro, University of Waterloo
  • ,
  • Jesper Sindahl Nielsen
  • ,
  • Sharma V. Thankachan, University of Waterloo, Canada

Let D={d1, d2, ⋯, dD} be a collection of D string documents of n characters in total. The two-pattern matching problems ask to index D for answering the following queries efficiently. Report/count the unique documents containing P1 and P2. Report/count the unique documents containing P1, but not P2.Here P1 and P2 represent input patterns of length p1 and p2 respectively. Linear space data structures with O(p1+p2+√nk logO(1)n) query cost are already known for the reporting version, where k represents the output size. For the counting version (i.e., report the value k), a simple linear-space index with O(p1+p2+√n) query cost can be constructed in O(n3/2) time. However, it is still not known if these are the best possible bounds for these problems. In this paper, we show a strong connection between these string indexing problems and the boolean matrix multiplication problem. Based on this, we argue that these results cannot be improved significantly using purely combinatorial techniques. We also provide an improved upper bound for a related problem known as common colors query problem.

Original languageEnglish
JournalTheoretical Computer Science
Pages (from-to)74-82
Number of pages9
Publication statusPublished - 2015

    Research areas

  • Boolean matrix multiplication, Data structures, Document retrieval, Lower bounds, String searching

See relations at Aarhus University Citationformats

ID: 97723073