Optimization
Consistency issues in Gaussian Mixture Models reduction algorithms
In many contexts Gaussian Mixtures (GM) are used to approximate probability distributions, possibly time-varying. In some applications the number of GM components exponentially increases over time, and reduction procedures are required to keep them reasonably limited. The GM reduction (GMR) problem can be formulated by choosing different measures of the dissimilarity of GMs before and after reduction, like the Kullback-Leibler Divergence (KLD) and the Integral Squared Error (ISE). Since in no case the solution is obtained in closed form, many approximate GMR algorithms have been proposed in the past three decades, although none of them provides optimality guarantees. In this work we discuss the importance of the choice of the dissimilarity measure and the issue of consistency of all steps of a reduction algorithm with the chosen measure. Indeed, most of the existing GMR algorithms are composed by several steps which are not consistent with a unique measure, and for this reason may produce reduced GMs far from optimality. In particular, the use of the KLD, of the ISE and normalized ISE is discussed and compared in this perspective.
Discriminative Bayesian Filtering Lends Momentum to the Stochastic Newton Method for Minimizing Log-Convex Functions
To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
Efficient Hyperparameter Optimization for Physics-based Character Animation
Physics-based character animation has seen significant advances in recent years with the adoption of Deep Reinforcement Learning (DRL). However, DRL-based learning methods are usually computationally expensive and their performance crucially depends on the choice of hyperparameters. Tuning hyperparameters for these methods often requires repetitive training of control policies, which is even more computationally prohibitive. In this work, we propose a novel Curriculum-based Multi-Fidelity Bayesian Optimization framework (CMFBO) for efficient hyperparameter optimization of DRL-based character control systems. Using curriculum-based task difficulty as fidelity criterion, our method improves searching efficiency by gradually pruning search space through evaluation on easier motor skill tasks. We evaluate our method on two physics-based character control tasks: character morphology optimization and hyperparameter tuning of DeepMimic. Our algorithm significantly outperforms state-of-the-art hyperparameter optimization methods applicable for physics-based character animation. In particular, we show that hyperparameters optimized through our algorithm result in at least 5x efficiency gain comparing to author-released settings in DeepMimic.
Speeding up Computational Morphogenesis with Online Neural Synthetic Gradients
Zhang, Yuyu, Chi, Heng, Chen, Binghong, Tang, Tsz Ling Elaine, Mirabella, Lucia, Song, Le, Paulino, Glaucio H.
A wide range of modern science and engineering applications are formulated as optimization problems with a system of partial differential equations (PDEs) as constraints. These PDE-constrained optimization problems are typically solved in a standard discretize-then-optimize approach. In many industry applications that require high-resolution solutions, the discretized constraints can easily have millions or even billions of variables, making it very slow for the standard iterative optimizer to solve the exact gradients. In this work, we propose a general framework to speed up PDE-constrained optimization using online neural synthetic gradients (ONSG) with a novel two-scale optimization scheme. We successfully apply our ONSG framework to computational morphogenesis, a representative and challenging class of PDE-constrained optimization problems. Extensive experiments have demonstrated that our method can significantly speed up computational morphogenesis (also known as topology optimization), and meanwhile maintain the quality of final solution compared to the standard optimizer. On a large-scale 3D optimal design problem with around 1,400,000 design variables, our method achieves up to 7.5x speedup while producing optimized designs with comparable objectives.
Financial Engineering and Artificial Intelligence in Python
Created by Lazy Programmer Team, Lazy Programmer Inc.Preview this Course - GET COUPON CODE Have you ever thought about what would happen if you combined the power of machine learning and artificial intelligence with financial engineering? Today, you can stop imagining, and start doing. This course will teach you the core fundamentals of financial engineering, with a machine learning twist. We will cover must-know topics in financial engineering, such as: Exploratory data analysis, significance testing, correlations, alpha and beta Time series analysis, simple moving average, exponentially-weighted moving average Holt-Winters exponential smoothing model Efficient Market Hypothesis Random Walk Hypothesis Time series forecasting ("stock price prediction") Modern portfolio theory Efficient frontier / Markowitz bullet Mean-variance optimization Maximizing the Sharpe ratio Convex optimization with Linear Programming and Quadratic Programming Capital Asset Pricing Model (CAPM) Algorithmic trading (VIP only) Statistical Factor Models (VIP only) Regime Detection with Hidden Markov Models (VIP only) In addition, we will look at various non-traditional techniques which stem purely from the field of machine learning and artificial intelligence, such as: Classification models Unsupervised learning Reinforcement learning and Q-learning ***VIP-only sections (get it while it lasts!) You will learn exactly why their methodology is fundamentally flawed and why their results are complete nonsense.
DC3: A learning method for optimization with hard constraints
Donti, Priya L., Rolnick, David, Kolter, J. Zico
Large optimization problems with hard constraints arise in many settings, yet classical solvers are often prohibitively slow, motivating the use of deep networks as cheap "approximate solvers." Unfortunately, naive deep learning approaches typically cannot enforce the hard constraints of such problems, leading to infeasible solutions. In this work, we present Deep Constraint Completion and Correction (DC3), an algorithm to address this challenge. Specifically, this method enforces feasibility via a differentiable procedure, which implicitly completes partial solutions to satisfy equality constraints and unrolls gradient-based corrections to satisfy inequality constraints. We demonstrate the effectiveness of DC3 in both synthetic optimization tasks and the real-world setting of AC optimal power flow, where hard constraints encode the physics of the electrical grid. In both cases, DC3 achieves near-optimal objective values while preserving feasibility. Traditional approaches to constrained optimization are often expensive to run for large problems, necessitating the use of function approximators. Neural networks are highly expressive and fast to run, making them ideal as function approximators. However, while deep learning has proven its power for unconstrained problem settings, it has struggled to perform well in domains where it is necessary to satisfy hard constraints at test time. For example, in power systems, weather and climate models, materials science, and many other areas, data follows well-known physical laws, and violation of these laws can lead to answers that are unhelpful or even nonsensical.
Wireless Federated Learning (WFL) for 6G Networks -- Part II: The Compute-then-Transmit NOMA Paradigm
Bouzinis, Pavlos S., Diamantoulakis, Panagiotis D., Karagiannidis, George K.
As it has been discussed in the first part of this work, the utilization of advanced multiple access protocols and the joint optimization of the communication and computing resources can facilitate the reduction of delay for wireless federated learning (WFL), which is of paramount importance for the efficient integration of WFL in the sixth generation of wireless networks (6G). To this end, in this second part we introduce and optimize a novel communication protocol for WFL networks, that is based on non-orthogonal multiple access (NOMA). More specifically, the Compute-then-Transmit NOMA (CT-NOMA) protocol is introduced, where users terminate concurrently the local model training and then simultaneously transmit the trained parameters to the central server. Moreover, two different detection schemes for the mitigation of inter-user interference in NOMA are considered and evaluated, which correspond to fixed and variable decoding order during the successive interference cancellation process. Furthermore, the computation and communication resources are jointly optimized for both considered schemes, with the aim to minimize the total delay during a WFL communication round. Finally, the simulation results verify the effectiveness of CT-NOMA in terms of delay reduction, compared to the considered benchmark that is based on time-division multiple access.
Improving the filtering of Branch-And-Bound MDD solver (extended)
Gillard, Xavier, Coppé, Vianney, Schaus, Pierre, Cire, André Augusto
This paper presents and evaluates two pruning techniques to reinforce the efficiency of constraint optimization solvers based on multi-valued decision-diagrams (MDD). It adopts the branch-and-bound framework proposed by Bergman et al. in 2016 to solve dynamic programs to optimality. In particular, our paper presents and evaluates the effectiveness of the local-bound (LocB) and rough upper-bound pruning (RUB). LocB is a new and effective rule that leverages the approximate MDD structure to avoid the exploration of non-interesting nodes. RUB is a rule to reduce the search space during the development of bounded-width-MDDs. The experimental study we conducted on the Maximum Independent Set Problem (MISP), Maximum Cut Problem (MCP), Maximum 2 Satisfiability (MAX2SAT) and the Traveling Salesman Problem with Time Windows (TSPTW) shows evidence indicating that rough-upper-bound and local-bound pruning have a high impact on optimization solvers based on branch-and-bound with MDDs. In particular, it shows that RUB delivers excellent results but requires some effort when defining the model. Also, it shows that LocB provides a significant improvement automatically; without necessitating any user-supplied information. Finally, it also shows that rough-upper-bound and local-bound pruning are not mutually exclusive, and their combined benefit supersedes the individual benefit of using each technique.
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling
Eriksson, Hannes, Dimitrakakis, Christos, Carlsson, Lars
We study the problem of performing automated experiment design for drug screening through Bayesian inference and optimisation. In particular, we compare and contrast the behaviour of linear-Gaussian models and Gaussian processes, when used in conjunction with upper confidence bound algorithms, Thompson sampling, or bounded horizon tree search. We show that non-myopic sophisticated exploration techniques using sparse tree search have a distinct advantage over methods such as Thompson sampling or upper confidence bounds in this setting. We demonstrate the significant superiority of the approach over existing and synthetic datasets of drug toxicity.