Optimization
Visualization of Multi-Objective Switched Reluctance Machine Optimization at Multiple Operating Conditions with t-SNE
Zhang, Shen, Zhang, Shibo, Li, Sufei, Du, Liang, Habetler, Thomas G.
--The optimization of electric machines at multiple operating points is crucial for applications that require frequent changes on speeds and loads, such as the electric vehicles, to strive for the machine optimal performance across the entire driving cycle. However, the number of objectives that would need to be optimized would significantly increase with the number of operating points considered in the optimization, thus posting a potential problem in regards to the visualization techniques currently in use, such as in the scatter plots of Pareto fronts, the parallel coordinates, and in the principal component analysis (PCA), inhibiting their ability to provide machine designers with intuitive and informative visualizations of all of the design candidates and their ability to pick a few for further fine-tuning with performance verification. Therefore, this paper proposes the utilization of t-distributed stochastic neighbor embedding (t-SNE) to visualize all of the optimization objectives of various electric machines design candidates with various operating conditions, which constitute a high-dimensional set of data that would lie on several different, but related, low-dimensional manifolds. Finally, two case studies of switched reluctance machines (SRM) are presented to illustrate the superiority of then t-SNE when compared to traditional visualization techniques used in electric machine optimizations. The process of electric machine design is a complex mixture of multi-physics field interactions and multi-objective optimizations [1].
Lookahead Bayesian Optimization via Rollout: Guarantees and Sequential Rolling Horizons
We consider the optimization problem: x arg max x X f (x), (1) where x is a d-dimensional vector and X is a compact (closed and bounded) set in R d . Given limited budget B, BO aims to search for the optimal x by itera-tively updating a surrogate model of f (x), where this surrogate is used to find the next design to evaluate. Typically, in BO, the surrogate model is a Gaussian process ( GP), due to its Bayesian interpretation and uncertainty quantification capability (see Rasmussen (2003) for more information). More specifically, given the current data D k, BO aims to determine the next informative sampling point x k 1 by solving the auxiliary problem: x k 1: x arg max x X Q k(x; D k). (2) where Q k is a acquisition/utility function that only involves evaluating the surrogate and not the expensive objective function f . Typically, evaluation of acquisition function is relatively cheap. The rationale is to seek design points that produce maximum increment of the objective function. After Eq. (2) is solved, the iterative algorithm proceeds by augmenting the current training data D k with a new observation to obtain D k 1 D k { (x k 1,y k 1) }. Popular choices of acquisition functions are entropy search (ES) (Hennig and Schuler, 2012), predictive entropy search (PES) (Hern andez-Lobato et al., 2014) and expectation improvement (EI) (Lam et al., 2016). All aforementioned functions exploit myopic strategies and ignore the future information.
Machine Learning for high speed channel optimization
He, Jiayi, Kumar, Aravind Sampath, Chada, Arun, Mutnury, Bhyrav, Drewniak, James
-- Design of printed circuit board (PCB) stack - up requires the consideration of characteristic impedance, insertion loss and crosstalk. As there are many parameters in a PCB stack - up design, the optimization of these parameters needs to be efficient and accurate. A le ss optimal stack - up would lead to expensive PCB material choices in high speed designs. In this paper, a n efficient global optimization method using parallel and intelligent Bayesian optimization is proposed for the stripline design . In high speed system design, optimizing printed circuit board (PCB) stack - up is playing a more and more important role in design stage.
Does Adam optimizer keep close to the optimal point?
Bae, Kiwook, Ryu, Heechang, Shin, Hayong
The adaptive optimizer for training neural networks has continually evolved to overcome the limitations of the previously proposed adaptive methods. Recent studies have found the rare counterexamples that Adam cannot converge to the optimal point. Those counterexamples reveal the distortion of Adam due to a small second momentum from a small gradient. Unlike previous studies, we show Adam cannot keep closer to the optimal point for not only the counterexamples but also a general convex region when the effective learning rate exceeds the certain bound. Subsequently, we propose an algorithm that overcomes Adam's limitation and ensures that it can reach and stay at the optimal point region.
On the Convergence of Local Descent Methods in Federated Learning
Haddadpour, Farzin, Mahdavi, Mehrdad
In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non i.i.d. data samples). In this paper, we generalize the local stochastic and full gradient descent with periodic averaging-- originally designed for homogeneous distributed optimization, to solve nonconvex optimization problems in federated learning. Although scant research is available on the effectiveness of local SGD in reducing the number of communication rounds in homogeneous setting, its convergence and communication complexity in heterogeneous setting is mostly demonstrated empirically and lacks through theoretical understating. To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity). Our theoretical results complement the recent empirical studies that demonstrate the applicability of local GD/SGD to federated learning. We also specialize the proposed local method for networked distributed optimization. To the best of our knowledge, the obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.
The importance of evaluating the complete automated knowledge-based planning pipeline
Babier, Aaron, Mahmood, Rafid, McNiven, Andrea L., Diamant, Adam, Chan, Timothy C. Y.
We determine how prediction methods combine with optimization methods in two-stage knowledge-based planning (KBP) pipelines to produce radiation therapy treatment plans. We trained two dose prediction methods, a generative adversarial network (GAN) and a random forest (RF) with the same 130 treatment plans. The models were applied to 87 out-of-sample patients to create two sets of predicted dose distributions that were used as input to two optimization models. The first optimization model, inverse planning (IP), estimates weights for dose-objectives from a predicted dose distribution and generates new plans using conventional inverse planning. The second optimization model, dose mimicking (DM), minimizes the sum of one-sided quadratic penalties between the predictions and the generated plans using several dose-objectives. Altogether, four KBP pipelines (GAN-IP, GAN-DM, RF-IP, and RF-DM) were constructed and benchmarked against the corresponding clinical plans using clinical criteria; the error of both prediction methods was also evaluated. The best performing plans were GAN-IP plans, which satisfied the same criteria as their corresponding clinical plans (78%) more often than any other KBP pipeline. However, GAN did not necessarily provide the best prediction for the second-stage optimization models. Specifically, both the RF-IP and RF-DM plans satisfied all clinical criteria 25% and 15% more often than GAN-DM plans (the worst performing planning), respectively. GAN predictions also had a higher mean absolute error (3.9 Gy) than those from RF (3.6 Gy). We find that state-of-the-art prediction methods when paired with different optimization algorithms, produce treatment plans with considerable variation in quality.
Scaling structural learning with NO-BEARS to infer causal transcriptome networks
Lee, Hao-Chih, Danieletto, Matteo, Miotto, Riccardo, Cherng, Sarah T., Dudley, Joel T.
Constructing gene regulatory networks is a critical step in revealing disease mechanisms from transcriptomic data. In this work, we present NO-BEARS, a novel algorithm for estimating gene regulatory networks. The NO-BEARS algorithm is built on the basis of the NOTEARS algorithm with two improvements. First, we propose a new constraint and its fast approximation to reduce the computational cost of the NO-TEARS algorithm. Next, we introduce a polynomial regression loss to handle non-linearity in gene expressions. Our implementation utilizes modern GPU computation that can decrease the time of hours-long CPU computation to seconds. Using synthetic data, we demonstrate improved performance, both in processing time and accuracy, on inferring gene regulatory networks from gene expression data.
Graph Structured Prediction Energy Networks
Graber, Colin, Schwing, Alexander
For joint inference over multiple variables, a variety of structured prediction techniques have been developed to model correlations among variables and thereby improve predictions. However, many classical approaches suffer from one of two primary drawbacks: they either lack the ability to model high-order correlations among variables while maintaining computationally tractable inference, or they do not allow to explicitly model known correlations. To address this shortcoming, we introduce `Graph Structured Prediction Energy Networks,' for which we develop inference techniques that allow to both model explicit local and implicit higher-order correlations while maintaining tractability of inference. We apply the proposed method to tasks from the natural language processing and computer vision domain and demonstrate its general utility.
Certifiable Robustness to Graph Perturbations
Bojchevski, Aleksandar, Günnemann, Stephan
Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy.
Efficient New Approaches for Data-Driven Global Optimization, with Applications in Computer-Aided Design
Alongside derivative-based methods, which scale better to higher-dimensional problems, derivative-free methods play an essential role in the optimization of many practical engineering systems, especially those in which function evaluations are determined by statistical averaging, and those for which the function of interest is nonconvex in the adjustable parameters. This work focuses on the development of a new family of surrogate-based derivative-free optimization schemes, namely $\Delta$-DOGS schemes. The idea unifying this efficient and (under the appropriate assumptions) provably-globally-convergent family of schemes is the minimization of a search function which linearly combines a computationally inexpensive ''surrogate (that is, an interpolation, or in some cases a regression, of recent function evaluations - we generally favor some variant of polyharmonic splines for this purpose), to summarize the trends evident in the data available thus far, with a synthetic piecewise-quadratic ''uncertainty function (built on the framework of a Delaunay triangulation of existing datapoints), to characterize the reliability of the surrogate by quantifying the distance of any given point in parameter space to the nearest function evaluations. The grid is successively refined as convergence is approached. Moreover, it handles the linear constraint domain. This work also introduces a method to scale the parameter domain under consideration based on adaptive variation of the seen data in the optimization process, thereby obtaining a significantly smoother surrogate.