A Randomly Accessible Lossless Compression Scheme for Time-Series Data

    Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

    568 Downloads (Pure)

    Abstract

    We detail a practical compression scheme for lossless compression of time-series data, based on the emerging concept of generalized deduplication. As data is no longer stored for just archival purposes, but needs to be continuously accessed in many applications, the scheme is designed for low-cost random access to its compressed data, avoiding decompression. With this method, an arbitrary bit of the original data can be read by accessing only a few hundred bits in the worst case, several orders of magnitude fewer than state-of-the-art compression schemes. Subsequent retrieval of bits requires visiting at most a few tens of bits. A comprehensive evaluation of the compressor on eight real-life data sets from various domains is provided. The cost of this random access capability is a loss in compression ratio compared with the state-of-the-art compression schemes BZIP2 and 7z, which can be as low as 5% depending on the data set. Compared to GZIP, the proposed scheme has a better compression ratio for most of the data sets. Our method has massive potential for applications requiring frequent random accesses, as the only existing approach with comparable random access cost is to store the data without compression.
    OriginalsprogEngelsk
    TitelINFOCOM 2020 - IEEE Conference on Computer Communications
    Antal sider10
    ForlagIEEE
    Publikationsdato2020
    Sider2145-2154
    Artikelnummer9155450
    ISBN (Elektronisk)9781728164120
    DOI
    StatusUdgivet - 2020
    BegivenhedIEEE Conference on Computer Communications - Toronto , Canada
    Varighed: 6 jul. 20209 jul. 2020

    Konference

    KonferenceIEEE Conference on Computer Communications
    Land/OmrådeCanada
    ByToronto
    Periode06/07/202009/07/2020

    Fingeraftryk

    Dyk ned i forskningsemnerne om 'A Randomly Accessible Lossless Compression Scheme for Time-Series Data'. Sammen danner de et unikt fingeraftryk.

    Citationsformater