Aarhus Universitets segl

Qi Zhang

GLEAN: Generalized Deduplication Enabled Approximate Edge Analytics

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

Standard

GLEAN: Generalized Deduplication Enabled Approximate Edge Analytics. / Hurst, Aaron; Lucani Rötter, Daniel Enrique; Assent, Ira et al.
I: IEEE Internet of Things Journal, 2022.

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

Harvard

APA

CBE

MLA

Vancouver

Author

Bibtex

@article{b3981c04c2f14114a9649b63e9a448e9,
title = "GLEAN: Generalized Deduplication Enabled Approximate Edge Analytics",
abstract = "The Internet of Things (IoT) has brought about exponential growth in sensor data. This has led to increasing demands for efficient and novel data transmission, storage and analytics solutions for sustainable IoT ecosystems. It has been shown that the Generalized Deduplication (GD) compression algorithm offers not only competitive compression ratio and throughput, but also random access properties that enable direct analytics of compressed data. In this paper, we thoroughly stresstest existing methods for direct analytics of GD compressed data with a diverse collection of 103 datasets, identify the need to optimise GD for analytics and develop a new version of GD to this end. We also propose the Generalized Deduplication Enabled Approximate Edge Analytics (GLEAN) framework. This framework applies the aforementioned analytics techniques at the Edge server to deliver end-to-end lossless data compression and highquality Edge analytics in the IoT, thereby addressing challenges related to data transmission, storage and analytics. Impressive analytics performance was achieved using this framework, with a median increase in k-means clustering error of just 2% relative to analytics performed on uncompressed data, while running 7.5x faster and requiring 3.9x less storage at the Edge server compared to universal compressors.",
author = "Aaron Hurst and {Lucani R{\"o}tter}, {Daniel Enrique} and Ira Assent and Qi Zhang",
year = "2022",
language = "English",
journal = "IEEE Internet of Things Journal",
issn = "2327-4662",
publisher = "Institute of Electrical and Electronics Engineers",

}

RIS

TY - JOUR

T1 - GLEAN: Generalized Deduplication Enabled Approximate Edge Analytics

AU - Hurst, Aaron

AU - Lucani Rötter, Daniel Enrique

AU - Assent, Ira

AU - Zhang, Qi

PY - 2022

Y1 - 2022

N2 - The Internet of Things (IoT) has brought about exponential growth in sensor data. This has led to increasing demands for efficient and novel data transmission, storage and analytics solutions for sustainable IoT ecosystems. It has been shown that the Generalized Deduplication (GD) compression algorithm offers not only competitive compression ratio and throughput, but also random access properties that enable direct analytics of compressed data. In this paper, we thoroughly stresstest existing methods for direct analytics of GD compressed data with a diverse collection of 103 datasets, identify the need to optimise GD for analytics and develop a new version of GD to this end. We also propose the Generalized Deduplication Enabled Approximate Edge Analytics (GLEAN) framework. This framework applies the aforementioned analytics techniques at the Edge server to deliver end-to-end lossless data compression and highquality Edge analytics in the IoT, thereby addressing challenges related to data transmission, storage and analytics. Impressive analytics performance was achieved using this framework, with a median increase in k-means clustering error of just 2% relative to analytics performed on uncompressed data, while running 7.5x faster and requiring 3.9x less storage at the Edge server compared to universal compressors.

AB - The Internet of Things (IoT) has brought about exponential growth in sensor data. This has led to increasing demands for efficient and novel data transmission, storage and analytics solutions for sustainable IoT ecosystems. It has been shown that the Generalized Deduplication (GD) compression algorithm offers not only competitive compression ratio and throughput, but also random access properties that enable direct analytics of compressed data. In this paper, we thoroughly stresstest existing methods for direct analytics of GD compressed data with a diverse collection of 103 datasets, identify the need to optimise GD for analytics and develop a new version of GD to this end. We also propose the Generalized Deduplication Enabled Approximate Edge Analytics (GLEAN) framework. This framework applies the aforementioned analytics techniques at the Edge server to deliver end-to-end lossless data compression and highquality Edge analytics in the IoT, thereby addressing challenges related to data transmission, storage and analytics. Impressive analytics performance was achieved using this framework, with a median increase in k-means clustering error of just 2% relative to analytics performed on uncompressed data, while running 7.5x faster and requiring 3.9x less storage at the Edge server compared to universal compressors.

M3 - Journal article

JO - IEEE Internet of Things Journal

JF - IEEE Internet of Things Journal

SN - 2327-4662

ER -