meda
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Wan, Zhongwei, Shen, Hui, Wang, Xin, Liu, Che, Mai, Zheda, Zhang, Mi
Long-context Multimodal Large Language Models (MLLMs) that incorporate long text-image and text-video modalities, demand substantial resources as their multimodal Key-Value (KV) caches grow with increasing input lengths, challenging inference efficiency. Existing methods for KV cache compression, in both text-only and multimodal LLMs, have neglected attention density variations across layers, thus often adopting uniform or progressive reduction strategies for layer-wise cache allocation. In this work, we propose MEDA, a dynamic layer-wise KV cache allocation method for efficient multimodal long-context inference. As its core, MEDA utilizes cross-modal attention entropy to determine the KV cache size at each MLLMs layer. Given the dynamically allocated KV cache size at each layer, MEDA also employs a KV pair selection scheme to identify which KV pairs to select and a KV pair merging strategy that merges the selected and non-selected ones to preserve information from the entire context. MEDA achieves up to 72% KV cache memory reduction and 2.82 times faster decoding speed, while maintaining or enhancing performance on various multimodal tasks in long-context settings, including multi-images and long-video scenarios. Our code is released at https://github.com/AIoT-MLSys-Lab/MEDA.
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction
Fan, Zhongxiang, Liu, Zhaocheng, Liang, Jian, Kong, Dongying, Li, Han, Jiang, Peng, Li, Shuang, Gai, Kun
This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary issue. To address this, we introduce a novel and simple Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios, which can be seamlessly integrated into existing deep CTR models and may have potential applications to handle the "forgetting or overfitting" dilemma in the retraining and the well-known catastrophic forgetting problems. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data or the Multi-Layer Perceptron (MLP) layers, and achieves data augmentation through training the MLP with varied embedding spaces. Our findings confirm that pre-trained MLP layers can adapt to new embedding spaces, enhancing performance without overfitting. This adaptability underscores the MLP layers' role in learning a matching function focused on the relative relationships among embeddings rather than their absolute positions. To our knowledge, MEDA represents the first multi-epoch training strategy tailored for deep CTR prediction models. We conduct extensive experiments on several public and business datasets, and the effectiveness of data augmentation and superiority over conventional single-epoch training are fully demonstrated. Besides, MEDA has exhibited significant benefits in a real-world online advertising system.
- Asia > China > Shandong Province > Dongying (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy (0.04)
- Asia > China > Beijing > Beijing (0.04)
Multi-Epoch Learning for Deep Click-Through Rate Prediction Models
Liu, Zhaocheng, Fan, Zhongxiang, Liang, Jian, Kong, Dongying, Li, Han
The one-epoch overfitting phenomenon has been widely observed in industrial Click-Through Rate (CTR) applications, where the model performance experiences a significant degradation at the beginning of the second epoch. Recent advances try to understand the underlying factors behind this phenomenon through extensive experiments. However, it is still unknown whether a multi-epoch training paradigm could achieve better results, as the best performance is usually achieved by one-epoch training. In this paper, we hypothesize that the emergence of this phenomenon may be attributed to the susceptibility of the embedding layer to overfitting, which can stem from the high-dimensional sparsity of data. To maintain feature sparsity while simultaneously avoiding overfitting of embeddings, we propose a novel Multi-Epoch learning with Data Augmentation (MEDA), which can be directly applied to most deep CTR models. MEDA achieves data augmentation by reinitializing the embedding layer in each epoch, thereby avoiding embedding overfitting and simultaneously improving convergence. To our best knowledge, MEDA is the first multi-epoch training paradigm designed for deep CTR prediction models. We conduct extensive experiments on several public datasets, and the effectiveness of our proposed MEDA is fully verified. Notably, the results show that MEDA can significantly outperform the conventional one-epoch training. Besides, MEDA has exhibited significant benefits in a real-world scene on Kuaishou.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Shandong Province > Dongying (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy (0.04)
NASA's Perseverance rover snaps selfies of its 'head' and 'face'
NASA's Perseverance rover has sent back two selfies of its camera-laden'face' and'head' from the Jezero Crater on the surface of Mars. The two snaps show Perseverance's remote sensing mast, which hosts many of the rover's cameras and scientific instruments. They were taken with the SHERLOC WATSON camera, located on the turret at the end of the rover's robotic arm. Perseverance touched down on the Red Planet on February 18 after a nearly seven-month journey through space. It is tasked with seeking traces of fossilised microbial life from Mars' ancient past and to collect rock specimens for return to Earth through future missions to the Red Planet.
- Government > Space Agency (0.96)
- Government > Regional Government > North America Government > United States Government (0.96)