sparse component
Low-Rank Plus Sparse Matrix Transfer Learning under Growing Representations and Ambient Dimensions
Chai, Jinhang, Liu, Xuyuan, Chen, Elynn, Yan, Yujun
Learning systems often expand their ambient features or latent representations over time, embedding earlier representations into larger spaces with limited new latent structure. We study transfer learning for structured matrix estimation under simultaneous growth of the ambient dimension and the intrinsic representation, where a well-estimated source task is embedded as a subspace of a higher-dimensional target task. We propose a general transfer framework in which the target parameter decomposes into an embedded source component, low-dimensional low-rank innovations, and sparse edits, and develop an anchored alternating projection estimator that preserves transferred subspaces while estimating only low-dimensional innovations and sparse modifications. We establish deterministic error bounds that separate target noise, representation growth, and source estimation error, yielding strictly improved rates when rank and sparsity increments are small. We demonstrate the generality of the framework by applying it to two canonical problems. For Markov transition matrix estimation from a single trajectory, we derive end-to-end theoretical guarantees under dependent noise. For structured covariance estimation under enlarged dimensions, we provide complementary theoretical analysis in the appendix and empirically validate consistent transfer gains.
Manifold denoising by Nonlinear Robust Principal Component Analysis
This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data. Theoretical error bounds are provided when the tangent spaces of the manifold satisfy certain incoherence conditions. We also provide a near optimal choice of the tuning parameters for the proposed optimization formulation with the help of a new curvature estimation method. The efficacy of our method is demonstrated on both synthetic and real datasets.
Transformed $\ell_1$ Regularizations for Robust Principal Component Analysis: Toward a Fine-Grained Understanding
Zhao, Kun, Zhang, Haoke, Wang, Jiayi, Lou, Yifei
Robust Principal Component Analysis (RPCA) aims to recover a low-rank structure from noisy, partially observed data that is also corrupted by sparse, potentially large-magnitude outliers. Traditional RPCA models rely on convex relaxations, such as nuclear norm and $\ell_1$ norm, to approximate the rank of a matrix and the $\ell_0$ functional (the number of non-zero elements) of another. In this work, we advocate a nonconvex regularization method, referred to as transformed $\ell_1$ (TL1), to improve both approximations. The rationale is that by varying the internal parameter of TL1, its behavior asymptotically approaches either $\ell_0$ or $\ell_1$. Since the rank is equal to the number of non-zero singular values and the nuclear norm is defined as their sum, applying TL1 to the singular values can approximate either the rank or the nuclear norm, depending on its internal parameter. We conduct a fine-grained theoretical analysis of statistical convergence rates, measured in the Frobenius norm, for both the low-rank and sparse components under general sampling schemes. These rates are comparable to those of the classical RPCA model based on the nuclear norm and $\ell_1$ norm. Moreover, we establish constant-order upper bounds on the estimated rank of the low-rank component and the cardinality of the sparse component in the regime where TL1 behaves like $\ell_0$, assuming that the respective matrices are exactly low-rank and exactly sparse. Extensive numerical experiments on synthetic data and real-world applications demonstrate that the proposed approach achieves higher accuracy than the classic convex model, especially under non-uniform sampling schemes.
- North America > United States > Texas > Dallas County > Richardson (0.04)
- North America > United States > North Carolina (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
RecIS: Sparse to Dense, A Unified Training Framework for Recommendation Models
Zong, Hua, Zeng, Qingtao, Zhou, Zhengxiong, Han, Zhihua, Yan, Zhensong, Liu, Mingjie, Sun, Hechen, Liu, Jiawei, Hu, Yiwen, Wang, Qi, Xian, YiHan, Guo, Wenjie, Xiang, Houyuan, Zeng, Zhiyuan, Sheng, Xiangrong, Yan, Bencheng, Hu, Nan, Huang, Yuheng, Lian, Jinqing, Xu, Ziru, Zhang, Yan, Huang, Ju, Yang, Siran, Yi, Huimin, Wang, Jiamang, Wang, Pengjie, Zhu, Han, Wu, Jian, Ou, Dan, Xu, Jian, Tang, Haihong, Jiang, Yuning, Zheng, Bo, Qu, Lin
In this paper, we propose RecIS, a unified Sparse-Dense training framework designed to achieve two primary goals: 1. Unified Framework To create a Unified sparse-dense training framework based on the PyTorch ecosystem that meets the training needs of industrial-grade recommendation models that integrated with large models. 2.System Optimization To optimize the sparse component, offering superior efficiency over the TensorFlow-based recommendation models. The dense component, meanwhile, leverages existing optimization technologies within the PyTorch ecosystem. Currently, RecIS is being used in Alibaba for numerous large-model enhanced recommendation training tasks, and some traditional sparse models have also begun training in it.
- Information Technology (0.69)
- Education (0.48)
LOST: Low-rank and Sparse Pre-training for Large Language Models
Li, Jiaxi, Yin, Lu, Shen, Li, Xu, Jinjin, Xu, Liwu, Huang, Tianjin, Wang, Wenwu, Liu, Shiwei, Wang, Xilu
While large language models (LLMs) have achieved remarkable performance across a wide range of tasks, their massive scale incurs prohibitive computational and memory costs for pre-training from scratch. Recent studies have investigated the use of low-rank parameterization as a means of reducing model size and training cost. In this context, sparsity is often employed as a complementary technique to recover important information lost in low-rank compression by capturing salient features in the residual space. However, existing approaches typically combine low-rank and sparse components in a simplistic or ad hoc manner, often resulting in undesirable performance degradation compared to full-rank training. In this paper, we propose \textbf{LO}w-rank and \textbf{S}parse pre-\textbf{T}raining (\textbf{LOST}) for LLMs, a novel method that ingeniously integrates low-rank and sparse structures to enable effective training of LLMs from scratch under strict efficiency constraints. LOST applies singular value decomposition to weight matrices, preserving the dominant low-rank components, while allocating the remaining singular values to construct channel-wise sparse components to complement the expressiveness of low-rank training. We evaluate LOST on LLM pretraining ranging from 60M to 7B parameters. Our experiments show that LOST achieves competitive or superior performance compared to full-rank models, while significantly reducing both memory and compute overhead. Moreover, Code is available at \href{https://github.com/JiaxiLi1/LOST-Low-rank-and-Sparse-Training-for-Large-Language-Models}{LOST Repo}
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing
Tian, Yuanhe, Cheng, Pengsen, Jin, Guoqing, Zhang, Lei, Song, Yan
Multi-modal affective computing aims to automatically recognize and interpret human attitudes from diverse data sources such as images and text, thereby enhancing human-computer interaction and emotion understanding. Existing approaches typically rely on unimodal analysis or straightforward fusion of cross-modal information that fail to capture complex and conflicting evidence presented across different modalities. In this paper, we propose a novel LLM-based approach for affective computing that explicitly deconstructs visual and textual representations into shared (modality-invariant) and modality-specific components. Specifically, our approach firstly encodes and aligns input modalities using pre-trained multi-modal encoders, then employs a representation decomposition framework to separate common emotional content from unique cues, and finally integrates these decomposed signals via an attention mechanism to form a dynamic soft prompt for a multi-modal LLM. Extensive experiments on three representative tasks for affective computing, namely, multi-modal aspect-based sentiment analysis, multi-modal emotion analysis, and hateful meme detection, demonstrate the effectiveness of our approach, which consistently outperforms strong baselines and state-of-the-art models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (7 more...)
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.46)
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
Zhang, Junyan, Gao, Yubo, Yan, Yibo, Li, Jungang, Hou, Zhaorui, Tao, Sicheng, Liu, Shuliang, Dai, Song, Hei, Yonghua, Li, Junzhuo, Hu, Xuming
The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Transfer Learning for High-dimensional Reduced Rank Time Series Models
Safikhani, Mingliang Ma Abolfazl
The objective of transfer learning is to enhance estimation and inference in a target data by leveraging knowledge gained from additional sources. Recent studies have explored transfer learning for independent observations in complex, high-dimensional models assuming sparsity, yet research on time series models remains limited. Our focus is on transfer learning for sequences of observations with temporal dependencies and a more intricate model parameter structure. Specifically, we investigate the vector autoregressive model (VAR), a widely recognized model for time series data, where the transition matrix can be deconstructed into a combination of a sparse matrix and a low-rank one. We propose a new transfer learning algorithm tailored for estimating high-dimensional VAR models characterized by low-rank and sparse structures. Additionally, we present a novel approach for selecting informative observations from auxiliary datasets. Theoretical guarantees are established, encompassing model parameter consistency, informative set selection, and the asymptotic distribution of estimators under mild conditions. The latter facilitates the construction of entry-wise confidence intervals for model parameters. Finally, we demonstrate the empirical efficacy of our methodologies through both simulated and real-world datasets.