alternating minimization
Alternating minimization for dictionary learning with random initialization
Our algorithm is a simple alternating minimization procedure that switches between $\ell_1$ minimization and gradient descent in alternate steps. Dictionary learning and specifically alternating minimization algorithms for dictionary learning are well studied both theoretically and empirically. However, in contrast to previous theoretical analyses for this problem, we replace a condition on the operator norm (that is, the largest magnitude singular value) of the true underlying dictionary $A^*$ with a condition on the matrix infinity norm (that is, the largest magnitude term). This not only allows us to get convergence rates for the error of the estimated dictionary measured in the matrix infinity norm, but also ensures that a random initialization will provably converge to the global optimum. Our guarantees are under a reasonable generative model that allows for dictionaries with growing operator norms, and can handle an arbitrary level of overcompleteness, while having sparsity that is information theoretically optimal. We also establish upper bounds on the sample complexity of our algorithm.
An Improved Analysis of Alternating Minimization for Structured Multi-Response Regression
Multi-response linear models aggregate a set of vanilla linear models by assuming correlated noise across them, which has an unknown covariance structure. To find the coefficient vector, estimators with a joint approximation of the noise covariance are often preferred than the simple linear regression in view of their superior empirical performance, which can be generally solved by alternating-minimization type procedures. Due to the non-convex nature of such joint estimators, the theoretical justification of their efficiency is typically challenging. The existing analyses fail to fully explain the empirical observations due to the assumption of resampling on the alternating procedures, which requires access to fresh samples in each iteration. In this work, we present a resampling-free analysis for the alternating minimization algorithm applied to the multi-response regression. In particular, we focus on the high-dimensional setting of multi-response linear models with structured coefficient parameter, and the statistical error of the parameter can be expressed by the complexity measure, Gaussian width, which is related to the assumed structure. More importantly, to the best of our knowledge, our result reveals for the first time that the alternating minimization with random initialization can achieve the same performance as the well-initialized one when solving this multi-response regression problem. Experimental results support our theoretical developments.
Alternating Minimization for Regression Problems with Vector-valued Outputs
In regression problems involving vector-valued outputs (or equivalently, multiple responses), it is well known that the maximum likelihood estimator (MLE), which takes noise covariance structure into account, can be significantly more accurate than the ordinary least squares (OLS) estimator. However, existing literature compares OLS and MLE in terms of their asymptotic, not finite sample, guarantees. More crucially, computing the MLE in general requires solving a non-convex optimization problem and is not known to be efficiently solvable. We provide finite sample upper and lower bounds on the estimation error of OLS and MLE, in two popular models: a) Pooled model, b) Seemingly Unrelated Regression (SUR) model. We provide precise instances where the MLE is significantly more accurate than OLS.
Communication Efficient Decentralization for Smoothed Online Convex Optimization
Bhuyan, Neelkamal, Mukherjee, Debankur, Wierman, Adam
We study the multi-agent Smoothed Online Convex Optimization (SOCO) problem, where $N$ agents interact through a communication graph. In each round, each agent $i$ receives a strongly convex hitting cost function $f^i_t$ in an online fashion and selects an action $x^i_t \in \mathbb{R}^d$. The objective is to minimize the global cumulative cost, which includes the sum of individual hitting costs $f^i_t(x^i_t)$, a temporal "switching cost" for changing decisions, and a spatial "dissimilarity cost" that penalizes deviations in decisions among neighboring agents. We propose the first decentralized algorithm for multi-agent SOCO and prove its asymptotic optimality. Our approach allows each agent to operate using only local information from its immediate neighbors in the graph. For finite-time performance, we establish that the optimality gap in competitive ratio decreases with the time horizon $T$ and can be conveniently tuned based on the per-round computation available to each agent. Moreover, our results hold even when the communication graph changes arbitrarily and adaptively over time. Finally, we establish that the computational complexity per round depends only logarithmically on the number of agents and almost linearly on their degree within the graph, ensuring scalability for large-system implementations.
- North America > United States > California (0.04)
- Asia > Middle East > Jordan (0.04)
- Energy > Renewable (1.00)
- Energy > Power Industry (1.00)
Reviews: Alternating minimization for dictionary learning with random initialization
This paper proposes and analyzes an alternating minimization-based algorithm to recover the dictionary matrix and sparse coefficient matrix in a dictionary learning setting. A primary component of the contribution here comes in the form of an alternate analysis of the matrix uncertainty (MU) selector of Belloni, Rosenbaum, and Tsybakov, to account for worst-case rather than probabilistic corruptions. Pros: The flavor of the contribution here seems to improve (i.e., relax) the conditions under which methods like this will succeed, relative to existing works. Specifically, the motivation and result of this work amounts to specifying sufficient conditions on the vectorized infinity norm of the unknown dictionary matrix, rather than its operator norm, under which provable recovery is possible. This has the effect of making the method potentially less dependent on ambient dimensions, especially for "typical" constructions of the (incoherent) dictionaries such as certain random generations.
Reviews: An Improved Analysis of Alternating Minimization for Structured Multi-Response Regression
This paper studies the multi-response regression model, and in particular, the alternating minimization algorithm for the problem . In contrast to prior work, this paper makes the following improvements: 1. It does not require the resampling assumptions that usually abound in showing results on alternating procedures. With good initialization, the procedure without resampling is able to achieve what is usually the minimax rate of such problems. The major technique is that of generic chaining, which allows the authors to prove bounds that hold uniformly over all iterates. I liked the paper and the result overall seems quite interesting, in particular points 1 and 2 above.
Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA
Chen, Shuangyi, Ju, Yue, Dalal, Hardik, Zhu, Zhongwen, Khisti, Ashish
Parameter-Efficient Fine-Tuning (PEFT) has risen as an innovative training strategy that updates only a select few model parameters, significantly lowering both computational and memory demands. PEFT also helps to decrease data transfer in federated learning settings, where communication depends on the size of updates. In this work, we explore the constraints of previous studies that integrate a well-known PEFT method named LoRA with federated fine-tuning, then introduce RoLoRA, a robust federated fine-tuning framework that utilizes an alternating minimization approach for LoRA, providing greater robustness against decreasing fine-tuning parameters and increasing data heterogeneity. Our results indicate that RoLoRA not only presents the communication benefits but also substantially enhances the robustness and effectiveness in multiple federated fine-tuning scenarios.
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Austria > Vienna (0.04)
Asymptotic Dynamics of Alternating Minimization for Non-Convex Optimization
Okajima, Koki, Takahashi, Takashi
This study investigates the asymptotic dynamics of alternating minimization applied to optimize a bilinear non-convex function with normally distributed covariates. We employ the replica method from statistical physics in a multi-step approach to precisely trace the algorithm's evolution. Our findings indicate that the dynamics can be described effectively by a two--dimensional discrete stochastic process, where each step depends on all previous time steps, revealing a memory dependency in the procedure. The theoretical framework developed in this work is broadly applicable for the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.
- North America > United States (0.28)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)