Goto

Collaborating Authors

 admm algorithm


A Dynamically Weighted ADMM Framework for Byzantine Resilience

Vijay, Vishnu, Pant, Kartik A., Cho, Minhyun, Hwang, Inseok

arXiv.org Artificial Intelligence

The alternating direction of multipliers method (ADMM) is a popular method to solve distributed consensus optimization utilizing efficient communication among various nodes in the network. However, in the presence of faulty or attacked nodes, even a small perturbation (or sharing false data) during the communication can lead to divergence of the solution. To address this issue, in this work we consider ADMM under the effect of Byzantine threat, where an unknown subset of nodes is subject to Byzantine attacks or faults. We propose Dynamically Weighted ADMM (DW-ADMM), a novel variant of ADMM that uses dynamic weights on the edges of the network, thus promoting resilient distributed optimization. We establish that the proposed method (i) produces a nearly identical solution to conventional ADMM in the error-free case, and (ii) guarantees a bounded solution with respect to the global minimizer, even under Byzantine threat. Finally, we demonstrate the effectiveness of our proposed algorithm using an illustrative numerical simulation.


Deep ADMM-Net for Compressive Sensing MRI

yan yang, Jian Sun, Huibin Li, Zongben Xu

Neural Information Processing Systems

Compressive Sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR image from a small number of under-sampled data in k -space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and computational speed, in this paper, we propose a novel deep architecture, dubbed ADMM-Net. ADMM-Net is defined over a data flow graph, which is derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a CS-based MRI model. In the training phase, all parameters of the net, e.g., image transforms, shrinkage functions, etc., are discriminatively trained end-to-end using L-BFGS algorithm. In the testing phase, it has computational overhead similar to ADMM but uses optimized parameters learned from the training data for CS-based reconstruction task. Experiments on MRI image reconstruction under different sampling ratios in k -space demonstrate that it significantly improves the baseline ADMM algorithm and achieves high reconstruction accuracies with fast computational speed.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper analyzes ADMM with the quadratic term in the augmented Lagrangian replaced by a Bregman divergence. Authors prove convergence rate results where L2 distance (of the standard ADMM guarantee) is replaced by the corresponding Bregman divergence. Similar to mirror descent, this can yield an improvement in the dimension-dependent constants of convergence bounds. Authors then demonstrate the utility of their algorithm on synthetic instances of a specific optimization problem (mass transportation problem).


Deep learning with Elastic Averaging SGD

Neural Information Processing Systems

We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameters they compute with a center variable stored by the parameter server (master). The algorithm enables the local workers to perform more exploration, i.e. the algorithm allows the local variables to fluctuate further from the center variable by reducing the amount of communication between local workers and the master. We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance. We propose synchronous and asynchronous variants of the new algorithm.


Learning to accelerate distributed ADMM using graph neural networks

Doerks, Henri, Häusner, Paul, Escobar, Daniel Hernández, Sjölund, Jens

arXiv.org Artificial Intelligence

Distributed optimization is fundamental in large-scale machine learning and control applications. Among existing methods, the Alternating Direction Method of Multipliers (ADMM) has gained popularity due to its strong convergence guarantees and suitability for decentralized computation. However, ADMM often suffers from slow convergence and sensitivity to hyperparameter choices. In this work, we show that distributed ADMM iterations can be naturally represented within the message-passing framework of graph neural networks (GNNs). Building on this connection, we propose to learn adaptive step sizes and communication weights by a graph neural network that predicts the hyperparameters based on the iterates. By unrolling ADMM for a fixed number of iterations, we train the network parameters end-to-end to minimize the final iterates error for a given problem class, while preserving the algorithm's convergence properties. Numerical experiments demonstrate that our learned variant consistently improves convergence speed and solution quality compared to standard ADMM. The code is available at https://github.com/paulhausner/learning-distributed-admm.


Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control: Non-Penalty Approach

Feng, Lechen, Li, Xun, Ni, Yuan-Hua

arXiv.org Artificial Intelligence

In [1], the distributed linear-quadratic problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ) are formulated into a nonsmooth and nonconvex optimization problem with affine constraints. Moreover, a penalty approach is considered in [1], and the PALM (proximal alternating linearized minimization) algorithm is studied with convergence and complexity analysis. In this paper, we aim to address the inherent drawbacks of the penalty approach, such as the challenge of tuning the penalty parameter and the risk of introducing spurious stationary points. Specifically, we first reformulate the SF-LQ problem and the DFT-LQ problem from an epi-composition function perspective, aiming to solve constrained problem directly. Then, from a theoretical viewpoint, we revisit the alternating direction method of multipliers (ADMM) and establish its convergence to the set of cluster points under certain assumptions. When these assumptions do not hold, we show that alternative approaches combining subgradient descent with Difference-of-Convex relaxation methods can be effectively utilized. In summary, our results enable the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates, restrictive structural assumptions or penalty formulations that incorporate constraints into the cost function.


Efficient Global Optimization of Two-Layer ReLU Networks: Quadratic-Time Algorithms and Adversarial Training

Bai, Yatong, Gautam, Tanmay, Sojoudi, Somayeh

arXiv.org Artificial Intelligence

The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of an ANN with ReLU activations can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of multiplier (ADMM). It solves both the exact convex formulation and the approximate counterpart. Linear global convergence is achieved, and the initial several iterations often yield a solution with high prediction accuracy. When solving the approximate formulation, the per-iteration time complexity is quadratic. The second algorithm, based on the "sampled convex programs" theory, solves unconstrained convex formulations and converges to an approximately globally optimal classifier. The non-convexity of the ANN training landscape exacerbates when adversarial training is considered. We apply the robust convex optimization theory to convex training and develop convex formulations that train ANNs robust to adversarial inputs. Our analysis explicitly focuses on one-hidden-layer fully connected ANNs, but can extend to more sophisticated architectures.


Distributed Primal-Dual Algorithms: Unification, Connections, and Insights

Wu, Runxiong, Liu, Dong, Wang, Xueqin, Wang, Andi

arXiv.org Machine Learning

We study primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, linearized ADMM, and proximal ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is closely related to a proximal ADMM algorithm. This discovery provides the insight that by adjusting the augmented Lagrangian parameter, we can easily enable the ADMM variants to outperform the CoCoA variants. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Our theoretical findings are supported by extensive simulation studies and real-world data analysis.


A Federated Distributionally Robust Support Vector Machine with Mixture of Wasserstein Balls Ambiguity Set for Distributed Fault Diagnosis

Ibrahim, Michael, Rozas, Heraldo, Gebraeel, Nagi, Xie, Weijun

arXiv.org Machine Learning

The training of classification models for fault diagnosis tasks using geographically dispersed data is a crucial task for original equipment manufacturers (OEMs) seeking to provide long-term service contracts (LTSCs) to their customers. Due to privacy and bandwidth constraints, such models must be trained in a federated fashion. Moreover, due to harsh industrial settings the data often suffers from feature and label uncertainty. Therefore, we study the problem of training a distributionally robust (DR) support vector machine (SVM) in a federated fashion over a network comprised of a central server and $G$ clients without sharing data. We consider the setting where the local data of each client $g$ is sampled from a unique true distribution $\mathbb{P}_g$, and the clients can only communicate with the central server. We propose a novel Mixture of Wasserstein Balls (MoWB) ambiguity set that relies on local Wasserstein balls centered at the empirical distribution of the data at each client. We study theoretical aspects of the proposed ambiguity set, deriving its out-of-sample performance guarantees and demonstrating that it naturally allows for the separability of the DR problem. Subsequently, we propose two distributed optimization algorithms for training the global FDR-SVM: i) a subgradient method-based algorithm, and ii) an alternating direction method of multipliers (ADMM)-based algorithm. We derive the optimization problems to be solved by each client and provide closed-form expressions for the computations performed by the central server during each iteration for both algorithms. Finally, we thoroughly examine the performance of the proposed algorithms in a series of numerical experiments utilizing both simulation data and popular real-world datasets.


Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

Yang, Yi-Jen, Yang, Ming-Hsun, Wu, Jwo-Yuh, Hong, Y. -W. Peter

arXiv.org Artificial Intelligence

This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data recovery with limited access to measurements, we propose a distributed sparse data recovery method, called the collaborative sparse recovery by anchor alignment (CoSR-AA) algorithm, where collaboration among caches is enabled by aligning their locally recovered data at a few anchor nodes. The proposed algorithm is based on the consensus alternating direction method of multipliers (ADMM) algorithm but with message exchange that is reduced by considering the proposed anchor alignment strategy. Then, by the deep unfolding of the ADMM iterations, we further propose the Deep CoSR-AA algorithm that can be used to significantly reduce the number of iterations. We obtain a graph neural network architecture where message exchange is done more efficiently by an embedded autoencoder. Simulations are provided to demonstrate the effectiveness of the proposed collaborative recovery algorithms in terms of the improved reconstruction quality and the reduced communication overhead due to anchor alignment.