AITopics | compactor

Collaborating Authors

compactor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores

Chari, Vivek, Van Durme, Benjamin

arXiv.org Artificial IntelligenceDec-10-2025

Modern Large Language Models (LLMs) are increasingly trained to support very large context windows. Unfortunately the ability to use long contexts in generation is complicated by the large memory requirement of the KV cache, which scales linearly with the context length. This memory footprint is often the dominant resource bottleneck in real-world deployments, limiting throughput and increasing serving costs. One way to address this is by compressing the KV cache, which can be done either with knowledge of the question being asked (query-aware) or without knowledge of the query (query-agnostic). We present Compactor, a training-free, query-agnostic KV compression strategy that uses approximate leverage scores to determine token importance. We show that Compactor can achieve the same performance as competing methods while retaining 20% fewer tokens in both synthetic and real-world context tasks, while being far more task-robust. We further introduce a procedure for context-calibrated compression: inferring the maximum compression a given context supports before significant performance loss. Using context-calibrated compression, we show that Compactor achieves full KV performance on Longbench while reducing the KV memory burden by 68%, on average. To demonstrate the efficacy and generalizability of our approach, we apply Compactor to 27 synthetic and real-world tasks from RULER and Longbench, with models from both the Qwen 2.5 and Llama 3.1 families.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.08143

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Weight-Inherited Distillation for Task-Agnostic BERT Compression

Wu, Taiqiang, Hou, Cheng, Zhao, Zhe, Lao, Shanshan, Li, Jiayi, Wong, Ngai, Yang, Yujiu

arXiv.org Artificial IntelligenceMay-15-2023

Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based methods focus on designing extra alignment losses for the student model to mimic the behavior of the teacher model. These methods transfer the knowledge in an indirect way. In this paper, we propose a novel Weight-Inherited Distillation (WID), which directly transfers knowledge from the teacher. WID does not require any additional alignment loss and trains a compact student by inheriting the weights, showing a new perspective of knowledge distillation. Specifically, we design the row compactors and column compactors as mappings and then compress the weights via structural re-parameterization. Experimental results on the GLUE and SQuAD benchmarks show that WID outperforms previous state-of-the-art KD-based baselines. Further analysis indicates that WID can also learn the attention patterns from the teacher model without any alignment loss on attention distributions.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.09098

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(15 more...)

Genre: Research Report (0.82)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

Schiefer, Nicholas, Chen, Justin Y., Indyk, Piotr, Narayanan, Shyam, Silwal, Sandeep, Wagner, Tal

arXiv.org Artificial IntelligenceApr-15-2023

An $\varepsilon$-approximate quantile sketch over a stream of $n$ inputs approximates the rank of any query point $q$ - that is, the number of input points less than $q$ - up to an additive error of $\varepsilon n$, generally with some probability of at least $1 - 1/\mathrm{poly}(n)$, while consuming $o(n)$ space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably optimal quantile approximation algorithm over worst-case streams, the approximations it achieves in practice are often far from optimal. Indeed, the most commonly used technique in practice is Dunning's t-digest, which often achieves much better approximations than KLL on real-world data but is known to have arbitrarily large errors in the worst case. We apply interpolation techniques to the streaming quantiles problem to attempt to achieve better approximations on real-world data sets than KLL while maintaining similar guarantees in the worst case.

artificial intelligence, compactor, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.07652

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization

Ding, Xiaohan, Hao, Tianxiang, Liu, Ji, Han, Jungong, Guo, Yuchen, Ding, Guiguang

arXiv.org Machine LearningSep-1-2020

However, as CNN's representational capacity depends Inspired by the neurobiology research about the independence on the width of conv layers, it is difficult to reduce the of remembering and forgetting, we propose to width without performance drops. On practical CNN architectures re-parameterize a CNN into the remembering parts and forgetting like ResNet-50 [16] and large-scale datasets like parts, where the former learn to maintain the performance ImageNet [6], lossless pruning with high compression ratio and the latter learn for efficiency. By training the has long been considered challenging. For reasonable tradeoff re-parameterized model using regular SGD on the former between compression ratio and performance, a typical but a novel update rule with penalty gradients on the latter, paradigm (Figure 1.A) [2, 3, 9, 30, 33, 56, 57] seeks to train we realize structured sparsity, enabling us to equivalently the model with magnitude-related penalty loss (e.g., group convert the re-parameterized model into the original architecture Lasso [51, 54]) on the conv kernels to produce structured with narrower layers.

artificial intelligence, machine learning, pruning, (18 more...)

arXiv.org Machine Learning

2007.0326

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Beijing > Beijing (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Discrepancy, Coresets, and Sketches in Machine Learning

Karnin, Zohar, Liberty, Edo

arXiv.org Machine LearningJun-11-2019

This paper defines the notion of class discrepancy for families of functions. It shows that low discrepancy classes admit small offline and streaming coresets. We provide general techniques for bounding the class discrepancy of machine learning problems. As corollaries of the general technique we bound the discrepancy (and therefore coreset complexity) of logistic regression, sigmoid activation loss, matrix covariance, kernel density and any analytic function of the dot product or the squared distance. Our results prove the existence of epsilon-approximation O(sqrt{d}/epsilon) sized coresets for the above problems. This resolves the long-standing open problem regarding the coreset complexity of Gaussian kernel density estimation. We provide two more related but independent results. First, an exponential improvement of the widely used merge-and-reduce trick which gives improved streaming sketches for any low discrepancy problem. Second, an extremely simple deterministic algorithm for finding low discrepancy sequences (and therefore coresets) for any positive semi-definite kernel. This paper establishes some explicit connections between class discrepancy, coreset complexity, learnability, and streaming algorithms.

artificial intelligence, coreset, machine learning, (14 more...)

arXiv.org Machine Learning

1906.04845

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback