The Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)
Aghanya, Nnamdi, Li, Jun, Wang, Kewei
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) can achieve near-optimal lossless compression by acting as powerful probability models. We investigate their use in the lossy domain, where reconstruction fidelity is traded for higher compression ratios. This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec that leverages a Masked Language Model (MLM) as a decompressor. Instead of storing a subset of original tokens, EPC allows the model to predict masked content and stores minimal, rank-based corrections only when the model's top prediction is incorrect. This creates a residual channel that offers continuous rate-distortion control. We compare EPC to a simpler Predictive Masking (PM) baseline and a transform-based Vector Quantisation with a Residual Patch (VQ+RE) approach. Through an evaluation that includes precise bit accounting and rate-distortion analysis, we demonstrate that EPC consistently dominates PM, offering superior fidelity at a significantly lower bit rate by more efficiently utilising the model's intrinsic knowledge.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Europe > United Kingdom (0.04)
- North America > Canada
- Genre:
- Research Report (0.42)
- Industry:
- Law > Litigation (0.62)
- Technology: