Generalized Deduplication: Bounds, Convergence, and Asymptotic Properties

    Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

    38 Citationer (Scopus)

    Abstract

    We study a generalization of deduplication, which enables lossless deduplication of highly similar data and show that classic deduplication with fixed chunk length is a special case. We provide bounds on the expected length of coded sequences for generalized deduplication and show that the coding has asymptotic near-entropy cost under the proposed source model. More importantly, we show that generalized deduplication allows for multiple orders of magnitude faster convergence than classic deduplication. This means that generalized deduplication can provide compression benefits much earlier than classic deduplication, which is key in practical systems. Numerical examples demonstrate our results, showing that our lower bounds are achievable, and illustrating the potential gain of using the generalization over classic deduplication. In fact, we show that even for a simple case of generalized deduplication, the gain in convergence speed is linear with the size of the data chunks.
    OriginalsprogEngelsk
    Titel2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings
    ForlagIEEE
    Publikationsdatodec. 2019
    Artikelnummer9014012
    ISBN (Elektronisk)978-1-7281-0962-6
    DOI
    StatusUdgivet - dec. 2019
    BegivenhedIEEE Global Communications (GLOBECOM 2019) - Kona, Hawaii, Kona, USA
    Varighed: 8 dec. 201912 dec. 2019

    Konference

    KonferenceIEEE Global Communications (GLOBECOM 2019)
    LokationKona, Hawaii
    Land/OmrådeUSA
    ByKona
    Periode08/12/201912/12/2019

    Fingeraftryk

    Dyk ned i forskningsemnerne om 'Generalized Deduplication: Bounds, Convergence, and Asymptotic Properties'. Sammen danner de et unikt fingeraftryk.

    Citationsformater