compress
DynaBERT: Dynamic BERT with Adaptive Width and Depth
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks. Comprehensive experiments under various efficiency constraints demonstrate that our proposed dynamic BERT (or RoBERTa) at its largest size has comparable performance as BERT-base (or RoBERTa-base), while at smaller widths and depths consistently outperforms existing BERT compression methods.
Compressing Chemistry Reveals Functional Groups
We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good explanation of data should also compress the data. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction datasets to discover dataset-specific functional groups. Fingerprints constructed from dataset-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials > Chemicals (0.68)
BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
Sy, Yaya, Cerisara, Christophe, Illina, Irina
Pruning large pre-trained transformers for low-resource languages is challenging, as it often requires massive retraining data to recover performance. For instance, Distill-Whisper prunes Whisper by 40% and retrains on 21,000 hours of speech, far beyond what is available for most languages. Can Whisper be made lighter and faster for edge devices in data-scarce settings? Focusing on Bambara with only 32h of speech-to-text data, we propose a new pruning recipe. Instead of vocabulary pruning, which is unsuitable due to frequent code-switching by Bambara speakers, we compress the embeddings with low-rank decomposition and feature distillation. Rather than removing layers, we merge them to limit performance loss. The final model preserves 90% of the original performance while being 48% smaller and 2.15x faster on a MacBook Air M1.
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.40)
- Africa > Mali (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery
Ye, David, Williams, Jan, Gao, Mars, Riva, Stefano, Tomasetto, Matteo, Zoro, David, Kutz, J. Nathan
PySHRED is a Python package that implements the SHallow REcurrent D ecoder (SHRED) architecture (Figure 1) and provides a high-level interface for sensing, model reduction and physics discovery tasks. Originally proposed as a sensing strategy which is agnostic to sensor placement [1], SHRED provides a lightweight, data-driven framework for reconstructing and forecasting high-dimensional spatiotemporal states from sparse sensor measurements. SHRED achieves this by (i) encoding time-lagged sensor sequences into a low-dimensional latent space using a sequence model, and (ii) decoding these latent representations back into the full spatial field via a decoder model. Since its introduction as a sparse sensing algorithm, several specialized variants have been developed to extend SHRED's capabilities: SHRED-ROM for parametric reduced-order modeling SINDy-SHRED for discovering sparse latent dynamics and stable long-horizon forecasting Multi-field SHRED for modeling dynamically coupled fields PySHRED unifies these variants into a single open-source, extensible, and thoroughly documented Python package, which is also capable of training on compressed representations of the data, allowing for efficient laptop-level training of models. It is accompanied by a rich example gallery of Jupyter Notebook and Google Colab tutorials.
- North America > United States > Washington > King County > Seattle (0.16)
- Europe > Italy > Lombardy > Milan (0.04)
- Pacific Ocean (0.04)
- (3 more...)
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
Tran, Hieu, Yao, Zonghai, Wang, Junda, Zhang, Yifan, Yang, Zhichao, Yu, Hong
This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.
- North America > United States > Massachusetts > Middlesex County > Lowell (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Worcester County > Worcester (0.04)
DynaBERT: Dynamic BERT with Adaptive Width and Depth
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks.
Unified Low-rank Compression Framework for Click-through Rate Prediction
Yu, Hao, Fu, Minghao, Ding, Jiandong, Zhou, Yusheng, Wu, Jianxin
Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (15 more...)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Yang, Dongjie, Han, XiaoDong, Gao, Yan, Hu, Yao, Zhang, Shilin, Zhao, Hai
Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference, hindering their scalability for real-time applications like chatbots. To accelerate inference, we store computed keys and values (KV cache) in the GPU memory. Existing methods study the KV cache compression to reduce memory by pruning the pre-computed KV cache. However, they neglect the inter-layer dependency between layers and huge memory consumption in pre-computation. To explore these deficiencies, we find that the number of crucial keys and values that influence future generations decreases layer by layer and we can extract them by the consistency in attention weights. Based on the findings, we propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer saves significant memory by computing fewer keys and values without sacrificing performance. Experimental results show PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache.
Hybrid Continuum-Eversion Robot: Precise Navigation and Decontamination in Nuclear Environments using Vine Robot
Al-Dubooni, Mohammed, Wong, Cuebong, Althoefer, Kaspar
Soft growing vine robots show great potential for navigation and decontamination tasks in the nuclear industry. This paper introduces a novel hybrid continuum-eversion robot designed to address certain challenges in relation to navigating and operating within pipe networks and enclosed remote vessels. The hybrid robot combines the flexibility of a soft eversion robot with the precision of a continuum robot at its tip, allowing for controlled steering and movement in hard to access and/or complex environments. The design enables the delivery of sensors, liquids, and aerosols to remote areas, supporting remote decontamination activities. This paper outlines the design and construction of the robot and the methods by which it achieves selective steering. We also include a comprehensive review of current related work in eversion robotics, as well as other steering devices and actuators currently under research, which underpin this novel active steering approach. This is followed by an experimental evaluation that demonstrates the robot's real-world capabilities in delivering liquids and aerosols to remote locations. The experiments reveal successful outcomes, with over 95% success in precision spraying tests. The paper concludes by discussing future work alongside limitations in the current design, ultimately showcasing its potential as a solution for remote decontamination operations in the nuclear industry.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (2 more...)