Direct Analytics of Generalized Deduplication Compressed IoT Data

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review


Given the ever increasing volume of data generated by the Internet of Things, data compression plays an essential role in reducing the cost of data transmission and storage. However, it also introduces a barrier, namely decompression, between users and the data-driven insights they require. We propose techniques for direct analytics of compressed data based on the Generalised Deduplication compression algorithm. When applied to data clustering, the accuracy of the proposed method differs by merely 1-5% when compared to analytics performed upon the uncompressed data. However, it runs four times faster, accesses only 14% as much data and, since the data is always compressed, requires significantly less storage. These results show that it is possible to simultaneously reap the benefits of compression and accurate, high-speed analytics in many applications.
Original languageEnglish
Title of host publication2021 IEEE Global Communications Conference, GLOBECOM 2021 - Proceedings
Publication date2021
ISBN (Electronic)9781728181042
Publication statusPublished - 2021
EventIEEE Conference and Exhibition on Global Telecommunications - Hybrid: In-Person and Virtual Conference, Madrid, Spain
Duration: 7 Dec 202111 Dec 2021
Conference number: 2021


ConferenceIEEE Conference and Exhibition on Global Telecommunications
LocationHybrid: In-Person and Virtual Conference
Internet address
SeriesIEEE Global Communications Conference (GLOBECOM)


  • Internet of Things
  • clustering methods
  • data compression
  • data mining
  • explainable AI


Dive into the research topics of 'Direct Analytics of Generalized Deduplication Compressed IoT Data'. Together they form a unique fingerprint.

Cite this