Optimization
Tracking Performance of Online Stochastic Learners
Vlaski, Stefan, Rizk, Elsa, Sayed, Ali H.
The utilization of online stochastic algorithms is popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches. When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy. Building on analogies with the study of adaptive filters, we establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models. The link allows us to infer the tracking performance from steady-state expressions directly and almost by inspection.
Generalized Flexible Hybrid Cable-Driven Robot (HCDR): Modeling, Control, and Analysis
Qi, Ronghuai, Khajepour, Amir, Melek, William W.
This paper presents a generalized flexible Hybrid Cable-Driven Robot (HCDR). For the proposed HCDR, the derivation of the equations of motion and proof provide a very effective way to find items for generalized system modeling. The proposed dynamic modeling approach avoids the drawback of traditional methods and can be easily extended to other types of hybrid robots, such as a robot arm mounted on an aircraft platform. Additionally, another goal of this paper is to develop integrated control systems to reduce vibrations and improve the accuracy and performance of the HCDR. To achieve this goal, redundancy resolution, stiffness optimization, and control strategies are studied. The proposed optimization problem and algorithm address the limitations of existing stiffness optimization approaches. Three types of control architecture are proposed, and their performances (i.e., reducing undesirable vibrations and trajectory tracking errors, especially for the end-effector) are evaluated using several well-designed case studies. Results show that the fully integrated control strategy can improve the tracking performance of the end-effector significantly.
Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms
Salim, Adil, Condat, Laurent, Mishchenko, Konstantin, Richtarik, Peter
We introduce a new primal-dual algorithm for minimizing the sum of three convex functions, each of which has its own oracle. Namely, the first one is differentiable, smooth and possibly stochastic, the second is proximable, and the last one is a composition of a proximable function with a linear map. Our theory covers several settings that are not tackled by any existing algorithm; we illustrate their importance with real-world applications. By leveraging variance reduction, we obtain convergence with linear rates under strong convexity and fast sublinear convergence under convexity assumptions. The proposed theory is simple and unified by the umbrella of stochastic Davis-Yin splitting, which we design in this work. Finally, we illustrate the efficiency of our method through numerical experiments.
Weighted Random Search for Hyperparameter Optimization
Florea, Adrian-Catalin, Andonie, Razvan
The vast majority of machine learning algorithms involve two different sets of parameters: the training parameters and the meta-parameters (also known as hyperparameters). While the training parameters are learned during the training phase, the values of the hyperparameters have to be specified before the learning phase. For instance, the hyperparameters of neural networks typically specify the architecture of the network (number and type of layers, number and type of nodes, etc). Determining the optimal combination of hyperparameter values leading to the best generalization performance can be done through repeated training and evaluation sessions, trying different combinations of hyperparameter values. We call each training evaluation process for one combination of hyperparameter values a trial. Each trial is computationally expensive, since it involves retraining the model.
From Local SGD to Local Fixed Point Methods for Federated Learning
Malinovsky, Grigory, Kovalev, Dmitry, Gasanov, Elnur, Condat, Laurent, Richtarik, Peter
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach.
PackingSolver: a solver for Packing Problems
Fontan, Florian, Libralesso, Luc
In this article, we introduce PackingSolver, a new solver for two-dimensional two- and three-staged guillotine Packing Problems. It relies on a simple yet powerful anytime tree search algorithm called Memory Bounded A* (MBA*). This algorithm was first introduced in libralesso2020 for the 2018 ROADEF/EURO challenge glass cutting problem (https://www.roadef.org/challenge/2018/en/index.php), for which it had been ranked first during the final phase. In this article, we generalize it for a large variety of Cutting and Packing problems. The resulting program can tackle two-dimensional Bin Packing, Multiple Knapsack, and Strip Packing Problems, with two- or three-staged exact or non-exact guillotine cuts, the orientation of the first cut being imposed or not, and with or without item rotation. Despite its simplicity and genericity, computational experiments show that this approach is competitive compared to the other dedicated algorithms from the literature. It even returns state-of-the-art solutions on several variants. The combination of efficiency, ability to provide good solutions fast, simplicity and versatility makes it particularly suited for industrial applications, which require quickly developing algorithms implementing several business-specific constraints.
Distributed Primal-Dual Optimization for Online Multi-Task Learning
Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.
Surrogate-assisted performance tuning of knowledge discovery algorithms: application to clinical pathway evolutionary modeling
Funkner, Anastasia A., Yakovlev, Aleksey N., Kovalchuk, Sergey V.
The paper proposes an approach for surrogate-assisted tuning of knowledge discovery algorithms. The approach is based on the prediction of both the quality and performance of the target algorithm. The prediction is furtherly used as objectives for the optimization and tuning of the algorithm. The approach is investigated using clinical pathways (CP) discovery problem resolved using the evolutionary-based clustering of electronic health records (EHR). Target algorithm and the proposed approach were applied to the discovery of CPs for Acute Coronary Syndrome patients in 3434 EHRs of patients treated in Almazov National Medical Research Center (Saint Petersburg, Russia). The study investigates the possible acquisition of interpretable clusters of typical CPs within a single disease. It shows how the approach could be used to improve complex data-driven analytical knowledge discovery algorithms. The study of the results includes the feature importance of the best surrogate model and discover how the parameters of input data influence the predictions.
Mirrorless Mirror Descent: A More Natural Discretization of Riemannian Gradient Flow
Gunasekar, Suriya, Woodworth, Blake, Srebro, Nathan
We present a direct (primal only) derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential function. We argue that this discretization is more faithful to the geometry than Natural Gradient Descent, which is obtained by a "full" forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to any Riemannian geometry, even when the metric tensor is not a Hessian, and thus there is no "dual".
Does Comma Selection Help To Cope With Local Optima
One hope of using non-elitism in evolutionary computation is that it aids leaving local optima. We perform a rigorous runtime analysis of a basic non-elitist evolutionary algorithm (EA), the $(\mu,\lambda)$ EA, on the most basic benchmark function with a local optimum, the jump function. We prove that for all reasonable values of the parameters and the problem, the expected runtime of the $(\mu,\lambda)$ EA is, apart from lower order terms, at least as large as the expected runtime of its elitist counterpart, the $(\mu+\lambda)$~EA (for which we conduct the first runtime analysis to allow this comparison). Consequently, the ability of the $(\mu,\lambda)$ EA to leave local optima to inferior solutions does not lead to a runtime advantage. We complement this lower bound with an upper bound that, for broad ranges of the parameters, is identical to our lower bound apart from lower order terms. This is the first runtime result for a non-elitist algorithm on a multi-modal problem that is tight apart from lower order terms.