LeCo: Lightweight Compression via Learning Serial Correlations
Liu, Yihao, Zeng, Xinyu, Zhang, Huanchen
–arXiv.org Artificial Intelligence
Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 5.2x speed up in a data analytical query in the Arrow columnar execution engine and a 16% increase in RocksDB's throughput.
arXiv.org Artificial Intelligence
Nov-22-2023
- Country:
- North America > United States > Colorado (0.14)
- Genre:
- Research Report (0.50)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science > Problem Solving (0.46)
- Machine Learning (1.00)
- Natural Language > Information Retrieval
- Query Processing (0.46)
- Representation & Reasoning
- Optimization (0.67)
- Search (0.67)
- Communications (0.93)
- Data Science (1.00)
- Databases (1.00)
- Information Management (1.00)
- Artificial Intelligence
- Information Technology