singular component
Why Can Accurate Models Be Learned from Inaccurate Annotations?
Si, Chongjie, Cui, Yidan, Yang, Fuchao, Yang, Xiaokang, Shen, Wei
Learning from inaccurate annotations has gained significant attention due to the high cost of precise labeling. However, despite the presence of erroneous labels, models trained on noisy data often retain the ability to make accurate predictions. This intriguing phenomenon raises a fundamental yet largely unexplored question: why models can still extract correct label information from inaccurate annotations remains unexplored. In this paper, we conduct a comprehensive investigation into this issue. By analyzing weight matrices from both empirical and theoretical perspectives, we find that label inaccuracy primarily accumulates noise in lower singular components and subtly perturbs the principal subspace. Within a certain range, the principal subspaces of weights trained on inaccurate labels remain largely aligned with those learned from clean labels, preserving essential task-relevant information. We formally prove that the angles of principal subspaces exhibit minimal deviation under moderate label inaccuracy, explaining why models can still generalize effectively. Building on these insights, we propose LIP, a lightweight plug-in designed to help classifiers retain principal subspace information while mitigating noise induced by label inaccuracy. Extensive experiments on tasks with various inaccuracy conditions demonstrate that LIP consistently enhances the performance of existing algorithms. We hope our findings can offer valuable theoretical and practical insights to understand of model robustness under inaccurate supervision.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (10 more...)
On Unbiased Low-Rank Approximation with Minimum Distortion
Barnes, Leighton Pate, Cameron, Stephen, Howard, Benjamin
We describe an algorithm for sampling a low-rank random matrix $Q$ that best approximates a fixed target matrix $P\in\mathbb{C}^{n\times m}$ in the following sense: $Q$ is unbiased, i.e., $\mathbb{E}[Q] = P$; $\mathsf{rank}(Q)\leq r$; and $Q$ minimizes the expected Frobenius norm error $\mathbb{E}\|P-Q\|_F^2$. Our algorithm mirrors the solution to the efficient unbiased sparsification problem for vectors, except applied to the singular components of the matrix $P$. Optimality is proven by showing that our algorithm matches the error from an existing lower bound.
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
Lee, Chanhyuk, Choi, Jiho, Lee, Chanryeol, Kim, Donggyun, Hong, Seunghoon
Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework, significantly enhancing computational efficiency in multi-task learning. Recently, several SVD-based techniques have been introduced to exploit low-rank structures for enhanced merging, but their reliance on such manually designed rank selection often leads to cross-task interference and suboptimal performance. In this paper, we propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models. W e empirically show that the dominant singular components of task vectors can cause critical interference with other tasks, and that naive truncation across tasks and layers degrades performance. In contrast, AdaRank dynamically prunes the singular components that cause interference and offers an optimal amount of information to each task vector by learning to prune ranks during test-time via entropy minimization. Our analysis demonstrates that such method mitigates detrimental overlaps among tasks, while empirical results show that AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Research Report > Promising Solution (0.86)
- Research Report > New Finding (0.66)
$\text{Transformer}^2$: Self-adaptive LLMs
Sun, Qi, Cetin, Edoardo, Tang, Yujin
This dynamic adjustment has parallels to concepts like fast-weight memories, which enable networks to update weights in response to task demands (Schmid-huber, 1992; Gomez & Schmidhuber, 2005), and neural network weights being treated as dynamic programs (Schmidhuber, 2015). Recently, Panigrahi et al. (2023) introduces an approach where a smaller auxiliary transformer is updated dynamically within a larger model, aligning with the principles of self-adaptive behavior. This adaptation can be explored from two perspectives: a macroview, where multiple LLMs collaborate and/or compete, and a microview, where internal adaptations allow a single LLM to specialize in different tasks. Macroview: From this perspective, the system directs queries to LLMs with domain specific expertise, prioritizing outputs from expert models, thereby achieving higher accuracy and task-specific optimization. Such task-specific ensembles can be realized through various mechanisms: multiple LLMs playing distinct roles and coordinate toward a shared goal (Zhuge et al., 2023), engaging in mutual listening and debate (Du et al., 2023), or using meticulously crafted prompt constructions (Zhang et al., 2024) to integrate knowledge library and skill planning.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
Wang, Hanqing, Xiao, Zeguan, Li, Yixia, Wang, Shuo, Chen, Guanhua, Chen, Yun
Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while keeping the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned subspace of the pretrained weight matrix. In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principle singular components frozen. It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principle matrix contains important knowledge. The MiLoRA initializes the low-rank matrices within a subspace that is orthogonal to the principle matrix, thus the pretrained knowledge is expected to be well preserved. During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the finetuning dataset. Extensive experiments on commonsense reasoning, math reasoning and instruction following benchmarks present the superior performance of our method.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Asia > Singapore (0.04)
- (4 more...)
GARNET: Reduced-Rank Topology Learning for Robust and Scalable Graph Neural Networks
Deng, Chenhui, Li, Xiuyu, Feng, Zhuo, Zhang, Zhiru
Graph neural networks (GNNs) have been increasingly deployed in various applications that involve learning on non-Euclidean data. However, recent studies show that GNNs are vulnerable to graph adversarial attacks. Although there are several defense methods to improve GNN robustness by eliminating adversarial components, they may also impair the underlying clean graph structure that contributes to GNN training. In addition, few of those defense models can scale to large graphs due to their high computational complexity and memory usage. In this paper, we propose GARNET, a scalable spectral method to boost the adversarial robustness of GNN models. GARNET first leverages weighted spectral embedding to construct a base graph, which is not only resistant to adversarial attacks but also contains critical (clean) graph structure for GNN training. Next, GARNET further refines the base graph by pruning additional uncritical edges based on probabilistic graphical model. GARNET has been evaluated on various datasets, including a large graph with millions of nodes. Our extensive experiment results show that GARNET achieves adversarial accuracy improvement and runtime speedup over state-of-the-art GNN (defense) models by up to 13.27% and 14.7x, respectively.
Variational approach for learning Markov processes from time series data
Inference, prediction and control of complex dynamical systems from time series is important in many areas, including financial markets, power grid management, climate and weather modeling, or molecular dynamics. The analysis of such highly nonlinear dynamical systems is facilitated by the fact that we can often find a (generally nonlinear) transformation of the system coordinates to features in which the dynamics can be excellently approximated by a linear Markovian model. Moreover, the large number of system variables often change collectively on large time- and length-scales, facilitating a low-dimensional analysis in feature space. In this paper, we introduce a variational approach for Markov processes (VAMP) that allows us to find optimal feature mappings and optimal Markovian models of the dynamics from given time series data. The key insight is that the best linear model can be obtained from the top singular components of the Koopman operator. This leads to the definition of a family of score functions called VAMP-r which can be calculated from data, and can be employed to optimize a Markovian model. In addition, based on the relationship between the variational scores and approximation errors of Koopman operators, we propose a new VAMP-E score, which can be applied to cross-validation for hyper-parameter optimization and model selection in VAMP. VAMP is valid for both reversible and nonreversible processes and for stationary and non-stationary processes or realizations.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
Self reference in word definitions
Levary, David, Eckmann, Jean-Pierre, Moses, Elisha, Tlusty, Tsvi
Dictionaries are inherently circular in nature. A given word is linked to a set of alternative words (the definition) which in turn point to further descendants. Iterating through definitions in this way, one typically finds that definitions loop back upon themselves. The graph formed by such definitional relations is our object of study. By eliminating those links which are not in loops, we arrive at a core subgraph of highly connected nodes. We observe that definitional loops are conveniently classified by length, with longer loops usually emerging from semantic misinterpretation. By breaking the long loops in the graph of the dictionary, we arrive at a set of disconnected clusters. We find that the words in these clusters constitute semantic units, and moreover tend to have been introduced into the English language at similar times, suggesting a possible mechanism for language evolution.
- Asia > Middle East > Israel (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)