Alexandria: A Proof-of-concept Implementation and Evaluation of Generalised Data Deduplication

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Standard

Alexandria: A Proof-of-concept Implementation and Evaluation of Generalised Data Deduplication. / Nielsen, Lars; Vestergaard, Rasmus; Yazdani, Niloofar; Talasila, Siva Rama Krishna Prasad; Lucani Rötter, Daniel Enrique; Sipos, Marton.

In: Globecom. I E E E Conference and Exhibition, 2019.

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Harvard

APA

CBE

MLA

Vancouver

Author

Bibtex

@article{c3e2315bb3764beeb761778477d06385,
title = "Alexandria: A Proof-of-concept Implementation and Evaluation of Generalised Data Deduplication",
abstract = "The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75{\%} compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.",
author = "Lars Nielsen and Rasmus Vestergaard and Niloofar Yazdani and Talasila, {Siva Rama Krishna Prasad} and {Lucani R{\"o}tter}, {Daniel Enrique} and Marton Sipos",
year = "2019",
language = "English",
journal = "Globecom. I E E E Conference and Exhibition",
issn = "1930-529X",
publisher = "IEEE",

}

RIS

TY - JOUR

T1 - Alexandria: A Proof-of-concept Implementation and Evaluation of Generalised Data Deduplication

AU - Nielsen, Lars

AU - Vestergaard, Rasmus

AU - Yazdani, Niloofar

AU - Talasila, Siva Rama Krishna Prasad

AU - Lucani Rötter, Daniel Enrique

AU - Sipos, Marton

PY - 2019

Y1 - 2019

N2 - The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.

AB - The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.

M3 - Journal article

JO - Globecom. I E E E Conference and Exhibition

JF - Globecom. I E E E Conference and Exhibition

SN - 1930-529X

ER -