Bonsai: A General Look at Dual Dedupliation

Hadi Sehat, Anders Lindskov Kloborg, Christian Mørup, Elena Pagnin, Daniel Enrique Lucani Rötter

Research output: Working paper/Preprint Working paperResearchpeer-review


Cloud Service Providers (CSPs) offer a vast amount of storage space at competitive prices to cope with the growing demand for digital data storage. Dual deduplication is a recent framework designed to improve data compression on the CSP while keeping clients’ data private from the CSP. To achieve this, clients perform lightweight information-theoretic transformations to their data prior to upload. We investigate the effectiveness of dual deduplication, and propose an improvement for the existing state-of-the-art method, named Yggdrasil. We name our proposal Bonsai as it aims at reducing storage fingerprint and improving scalability. Compared to Yggdrasil, Bonsai achieves (1) significant reduction in client storage, (2) reduction in the total required storage (client + CSP), and (3) reduction in the deduplication time on the CSP. Our experiments show that Bonsai achieves compression rates of 68% on the CSP and 5% on the client, while allowing the CSP to identify deduplications in a time-efficient manner. We also show that combining our method with universal compressors in the cloud, e.g., Brotli, can yield better overall compression on the data compared to only applying the universal compressor or plain Bonsai. Finally, we show that Bonsai provide sufficient privacy against an honest-but-curious CPS that knows the distribution of the Clients’ original data.
Original languageEnglish
Publication statusSubmitted - 2022


Dive into the research topics of 'Bonsai: A General Look at Dual Dedupliation'. Together they form a unique fingerprint.

Cite this