Abstract
Erasure coding in distributed multi-cloud data storage increases availability, durability and security, but it also makes data analytics inefficient since the whole dataset must be reconstructed to answer a query, even if the result set is a small fraction of the complete file. Data compression has a similar trade-off as it can reduce storage costs while requiring the entire compressed data to be collected and decompressed in order to access even a few bytes. We propose TREAT, a novel method that combines erasure coding and compression to achieve efficient queries of time series datasets while keeping the benefits of both underlying techniques. Our evaluation of five real-life datasets shows that it can answer range queries up to 25 times faster with 100 times less data transfer than reconstructing the whole dataset.
Original language | English |
---|---|
Title of host publication | DEBS 2024 : Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems |
Number of pages | 12 |
Publisher | Association for Computing Machinery |
Publication date | Jul 2024 |
Pages | 147-158 |
ISBN (Electronic) | 979-8-4007-0443-7 |
DOIs | |
Publication status | Published - Jul 2024 |
Keywords
- Compression
- Distributed Storage
- Erasure coding
- Generalized Deduplication
- IoT
- Query
- RLNC
- Time series