PairwiseHist: Fast, Accurate, and Space-Efficient Approximate Query Processing with Data Compression

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperConference articleResearchpeer-review

Abstract

Exponential growth in data collection is creating significant challenges for data storage and analytics latency. Approximate Query Processing (AQP) has long been touted as a solution for accelerating analytics on large datasets, however, there is still room for improvement across all key performance criteria. In this paper, we propose a novel histogram-based data synopsis called PairwiseHist that uses recursive hypothesis testing to ensure accurate histograms and can be built on top of data compressed using Generalized Deduplication (GD). We thus show that GD data compression can contribute to AQP. Compared to state-of-the-art AQP approaches, PairwiseHist achieves better performance across all key metrics, including 2.6× higher accuracy, 3.5× lower latency, 24× smaller synopses and 1.5–4× faster construction time.

Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume17
Issue6
Pages (from-to)1432-1445
Number of pages14
ISSN2150-8097
DOIs
Publication statusPublished - 3 May 2024

Fingerprint

Dive into the research topics of 'PairwiseHist: Fast, Accurate, and Space-Efficient Approximate Query Processing with Data Compression'. Together they form a unique fingerprint.

Cite this