TABOO: Detecting unstructured sensitive information using recursive neural networks

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review


Leak of sensitive information from unstructured text documents is a costly problem both for government and for industrial institutions. Traditional approaches for data leak prevention are commonly based on the hypothesis that sensitive information is reflected in the presence of distinct sensitive words. However, for complex sensitive information, this hypothesis may not hold. Our TABOO system detects complex sensitive information in text documents by learning the semantic and syntactic structure of text documents. Our approach is based on natural language processing methods for paraphrase detection, and uses recursive neural networks to assign sensitivity scores to the semantic components of the sentence structure. The demonstration of TABOO focuses on interactive detection of sensitive information with the TABOO system. Users may work with real documents, alter documents or prepare free text, and subject it to information detection. TABOO allows users to work with our TABOO engine or with traditional approaches, and to compare results. Users may verify that single words can change sensitivity according to context, thereby giving hands-on experience with complex cases of sensitive information.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
Number of pages2
PublisherIEEE Computer Society Press
Publication year16 May 2017
Article number7930091
ISBN (Electronic)9781509065431
Publication statusPublished - 16 May 2017
Event33rd IEEE International Conference on Data Engineering, ICDE 2017 - San Diego, United States
Duration: 19 Apr 201722 Apr 2017


Conference33rd IEEE International Conference on Data Engineering, ICDE 2017
LandUnited States
BySan Diego
SeriesProceedings of the International Conference on Data Engineering

See relations at Aarhus University Citationformats

ID: 118499477