Optimization
Distributed Optimization, Averaging via ADMM, and Network Topology
França, Guilherme, Bento, José
There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of datasets and model complexity in modern machine learning applications. Scalable solvers often distribute the computation over a network of processing units. For simple algorithms such as gradient descent the dependency of the convergence time with the topology of this network is well-known. However, for more involved algorithms such as the Alternating Direction Methods of Multipliers (ADMM) much less is known. At the heart of many distributed optimization algorithms there exists a gossip subroutine which averages local information over the network, and whose efficiency is crucial for the overall performance of the method. In this paper we review recent research in this area and, with the goal of isolating such a communication exchange behaviour, we compare different algorithms when applied to a canonical distributed averaging consensus problem. We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence and optimal parameter tuning in terms of spectral properties of the network. Finally, we empirically study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
Particle Swarm Optimized Federated Learning For Industrial IoT and Smart City Services
Qolomany, Basheer, Ahmad, Kashif, Al-Fuqaha, Ala, Qadir, Junaid
Most of the research on Federated Learning (FL) has focused on analyzing global optimization, privacy, and communication, with limited attention focusing on analyzing the critical matter of performing efficient local training and inference at the edge devices. One of the main challenges for successful and efficient training and inference on edge devices is the careful selection of parameters to build local Machine Learning (ML) models. To this aim, we propose a Particle Swarm Optimization (PSO)-based technique to optimize the hyperparameter settings for the local ML models in an FL environment. We evaluate the performance of our proposed technique using two case studies. First, we consider smart city services and use an experimental transportation dataset for traffic prediction as a proxy for this setting. Second, we consider Industrial IoT (IIoT) services and use the real-time telemetry dataset to predict the probability that a machine will fail shortly due to component failures. Our experiments indicate that PSO provides an efficient approach for tuning the hyperparameters of deep Long short-term memory (LSTM) models when compared to the grid search method. Our experiments illustrate that the number of clients-server communication rounds to explore the landscape of configurations to find the near-optimal parameters are greatly reduced (roughly by two orders of magnitude needing only 2%--4% of the rounds compared to state of the art non-PSO-based approaches). We also demonstrate that utilizing the proposed PSO-based technique to find the near-optimal configurations for FL and centralized learning models does not adversely affect the accuracy of the models.
A General Framework for Fairness in Multistakeholder Recommendations
Chaudhari, Harshal A., Lin, Sangdi, Linda, Ondrej
Contemporary recommender systems act as intermediaries on multi-sided platforms serving high utility recommendations from sellers to buyers. Such systems attempt to balance the objectives of multiple stakeholders including sellers, buyers, and the platform itself. The difficulty in providing recommendations that maximize the utility for a buyer, while simultaneously representing all the sellers on the platform has lead to many interesting research problems.Traditionally, they have been formulated as integer linear programs which compute recommendations for all the buyers together in an \emph{offline} fashion, by incorporating coverage constraints so that the individual sellers are proportionally represented across all the recommended items. Such approaches can lead to unforeseen biases wherein certain buyers consistently receive low utility recommendations in order to meet the global seller coverage constraints. To remedy this situation, we propose a general formulation that incorporates seller coverage objectives alongside individual buyer objectives in a real-time personalized recommender system. In addition, we leverage highly scalable submodular optimization algorithms to provide recommendations to each buyer with provable theoretical quality bounds. Furthermore, we empirically evaluate the efficacy of our approach using data from an online real-estate marketplace.
On Communication Compression for Distributed Optimization on Heterogeneous Data
Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD (D-QSGD) with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors (D-EF-SGD) in the heterogeneous (non-iid) data setting. Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high. We propose two alternatives that are not (or much less) affected by heterogenous data distributions: a new method that is only applicable to strongly convex problems, and we point out a more general approach that is applicable to linear compressors.
On the implementation of a global optimization method for mixed-variable problems
An optimization problem without any structural information on the objective function or the constraints, but for which we have the ability to evaluate them at given points, is called a black-box problem. The area of derivativefree optimization is dedicated to the study of optimization algorithms that do not rely on computing the partial derivatives of the objective function, and it is naturally applied to black-box problems. Many optimization problems in engineering are solved by treating them as a black box, for two main reasons: first, the objective function may not be known in an explicit form, e.g., when it is the output of a complex computer simulation; second, even if derivatives may exist and be computable, the effort required may make it impractical, or the low accuracy of their computation may make them unreliable. This paper discusses the implementation of a global derivative-free optimization algorithm that is specifically aimed at black-box problems with expensive objective function evaluations.
On the Global Convergence Rates of Softmax Policy Gradient Methods
Mei, Jincheng, Xiao, Chenjun, Szepesvari, Csaba, Schuurmans, Dale
We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a \L{}ojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy. This result resolves an open question in the recent literature. Finally, combining the above two results and additional new $\Omega(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform \L{}ojasiewicz degree. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.
Robust, Accurate Stochastic Optimization for Variational Inference
Dhaka, Akash Kumar, Catalina, Alejandro, Andersen, Michael Riis, Magnusson, Måns, Huggins, Jonathan H., Vehtari, Aki
We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution, (2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs to the assumed variational family, common stochastic optimization methods lead to poor variational approximations if the problem dimension is moderately large. We also demonstrate that these methods are not robust across diverse model types. Motivated by these findings, we develop a more robust and accurate stochastic optimization framework by viewing the underlying optimization algorithm as producing a Markov chain. Our approach is theoretically motivated and includes a diagnostic for convergence and a novel stopping rule, both of which are robust to noisy evaluations of the objective function. We show empirically that the proposed framework works well on a diverse set of models: it can automatically detect stochastic optimization failure or inaccurate variational approximation.
Fast algorithms for robust principal component analysis with an upper bound on the rank
Sha, Ningyu, Shi, Lei, Yan, Ming
The robust principal component analysis (RPCA) decomposes a data matrix into a low-rank part and a sparse part. There are mainly two types of algorithms for RPCA. The first type of algorithm applies regularization terms on the singular values of a matrix to obtain a low-rank matrix. However, calculating singular values can be very expensive for large matrices. The second type of algorithm replaces the low-rank matrix as the multiplication of two small matrices. They are faster than the first type because no singular value decomposition (SVD) is required. However, the rank of the low-rank matrix is required, and an accurate rank estimation is needed to obtain a reasonable solution. In this paper, we propose algorithms that combine both types. Our proposed algorithms require an upper bound of the rank and SVD on small matrices. First, they are faster than the first type because the cost of SVD on small matrices is negligible. Second, they are more robust than the second type because an upper bound of the rank instead of the exact rank is required. Furthermore, we apply the Gauss-Newton method to increase the speed of our algorithms. Numerical experiments show the better performance of our proposed algorithms.
How AI helps to finally let the fusion reactor become a reality
In Marvel's comic universe following the end of World War II Howard Stark tries to tap into the energy of the mystical "Tesseract" and develops the arc reactor -- a technology he believes to hold the key to unlimited, sustainable energy and would make nuclear energy look like an AAA battery. However, the perfect reactor cannot be built without a certain theoretical element and he lacks the technology to synthesize it. In the film "Iron Man", his son Tony Stark builds a miniature version of the Arc Reactor when held hostage in an Afghan cave to power an electromagnet, which keeps deadly shrapnel from piercing his heart. Even this small reactor has a remarkable output of 3 GJ/s -- as much as three times the average energy produced by a nuclear power plant. As the reactor's waste products threaten to poison him, Tony searches for new elements for the reaction.
An enhanced simulation-based iterated local search metaheuristic for gravity fed water distribution network design optimization
Martinho, Willian C. S., Melo, Rafael A., Sörensen, Kenneth
The gravity fed water distribution network design (WDND) optimization problem consists in determining the pipe diameters of a water network such that hydraulic constraints are satisfied and the total cost is minimized. Traditionally, such design decisions are made on the basis of expert experience. When networks increase in size, however, rules of thumb will rarely lead to near optimal decisions. Over the past thirty years, a large number of techniques have been developed to tackle the problem of optimally designing a water distribution network. In this paper, we tackle the NP-hard water distribution network design (WDND) optimization problem in a multi-period setting where time varying demand patterns occur. We propose a new simulation-based iterated local search metaheuristic which further explores the structure of the problem in an attempt to obtain high quality solutions. Computational experiments show that our approach is very competitive as it is able to improve over a state-of-the-art metaheuristic for most of the performed tests. Furthermore, it converges much faster to low cost solutions and demonstrates a more robust performance in that it obtains smaller deviations from the best known solutions.