TY - UNPB
T1 - Bonsai: A General Look at Dual Dedupliation
AU - Sehat, Hadi
AU - Lindskov Kloborg, Anders
AU - Mørup, Christian
AU - Pagnin, Elena
AU - Lucani Rötter, Daniel Enrique
PY - 2022
Y1 - 2022
N2 - Cloud Service Providers (CSPs) offer a vast amount of storage space at competitive prices to cope with the growing demand for digital data storage. Dual deduplication is a recent framework designed to improve data compression on the CSP while keeping clients’ data private from the CSP. To achieve this, clients perform lightweight information-theoretic transformations to their data prior to upload. We investigate the effectiveness of dual deduplication, and propose an improvement for the existing state-of-the-art method, named Yggdrasil. We name our proposal Bonsai as it aims at reducing storage fingerprint and improving scalability. Compared to Yggdrasil, Bonsai achieves (1) significant reduction in client storage, (2) reduction in the total required storage (client + CSP), and (3) reduction in the deduplication time on the CSP. Our experiments show that Bonsai achieves compression rates of 68% on the CSP and 5% on the client, while allowing the CSP to identify deduplications in a time-efficient manner. We also show that combining our method with universal compressors in the cloud, e.g., Brotli, can yield better overall compression on the data compared to only applying the universal compressor or plain Bonsai. Finally, we show that Bonsai provide sufficient privacy against an honest-but-curious CPS that knows the distribution of the Clients’ original data.
AB - Cloud Service Providers (CSPs) offer a vast amount of storage space at competitive prices to cope with the growing demand for digital data storage. Dual deduplication is a recent framework designed to improve data compression on the CSP while keeping clients’ data private from the CSP. To achieve this, clients perform lightweight information-theoretic transformations to their data prior to upload. We investigate the effectiveness of dual deduplication, and propose an improvement for the existing state-of-the-art method, named Yggdrasil. We name our proposal Bonsai as it aims at reducing storage fingerprint and improving scalability. Compared to Yggdrasil, Bonsai achieves (1) significant reduction in client storage, (2) reduction in the total required storage (client + CSP), and (3) reduction in the deduplication time on the CSP. Our experiments show that Bonsai achieves compression rates of 68% on the CSP and 5% on the client, while allowing the CSP to identify deduplications in a time-efficient manner. We also show that combining our method with universal compressors in the cloud, e.g., Brotli, can yield better overall compression on the data compared to only applying the universal compressor or plain Bonsai. Finally, we show that Bonsai provide sufficient privacy against an honest-but-curious CPS that knows the distribution of the Clients’ original data.
M3 - Working paper
BT - Bonsai: A General Look at Dual Dedupliation
ER -