Mathematical & Statistical Methods
MILP for the Multi-objective VM Reassignment Problem
Saber, Takfarinas, Ventresque, Anthony, Marques-Silva, Joao, Thorburn, James, Murphy, Liam
Machine Reassignment is a challenging problem for constraint programming (CP) and mixed-integer linear programming (MILP) approaches, especially given the size of data centres. The multi-objective version of the Machine Reassignment Problem is even more challenging and it seems unlikely for CP or MILP to obtain good results in this context. As a result, the first approaches to address this problem have been based on other optimisation methods, including metaheuristics. In this paper we study under which conditions a mixed-integer optimisation solver, such as IBM ILOG CPLEX, can be used for the Multi-objective Machine Reassignment Problem. We show that it is useful only for small or medium-scale data centres and with some relaxations, such as an optimality tolerance gap and a limited number of directions explored in the search space. Building on this study, we also investigate a hybrid approach, feeding a metaheuristic with the results of CPLEX, and we show that the gains are important in terms of quality of the set of Pareto solutions (+126.9% against the metaheuristic alone and +17.8% against CPLEX alone) and number of solutions (8.9 times more than CPLEX), while the processing time increases only by 6% in comparison to CPLEX for execution times larger than 100 seconds.
Bellman equation
A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming.[1] It writes the "value" of a decision problem at a certain point in time in terms of the payoff from some initial choices and the "value" of the remaining decision problem that results from those initial choices.[citation The Bellman equation was first applied to engineering control theory and to other topics in applied mathematics, and subsequently became an important tool in economic theory; though the basic concepts of dynamic programming are prefigured in John von Neumann and Oskar Morgenstern's Theory of Games and Economic Behavior and Abraham Wald's sequential analysis.[citation In continuous-time optimization problems, the analogous equation is a partial differential equation that is called the Hamilton–Jacobi–Bellman equation.[4][5] In discrete time any multi-stage optimization problem can be solved by analyzing the appropriate Bellman equation.
Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods
Liang, Guannan, Tong, Qianqian, Zhu, Chunjiang, Bi, Jinbo
Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that SGD satisfies the correlated negative curvature (CNC) condition for these problems. Therefore, we propose to use a separate SGD step to help the SCSG method escape from strict saddle points, resulting in the CNC-SCSG method. The SGD step plays a role similar to noise injection but is more stable. We prove that the resultant algorithm converges to a second-order stationary point with a convergence rate of $\tilde{O}( \epsilon^{-2} log( 1/\epsilon))$ where $\epsilon$ is the pre-specified error tolerance. This convergence rate is independent of the problem dimension, and is faster than that of CNC-SGD. A more general framework is further designed to incorporate the proposed CNC-SCSG into any first-order method for the method to escape saddle points. Simulation studies illustrate that the proposed algorithm can escape saddle points in much fewer epochs than the gradient descent methods perturbed by either noise injection or a SGD step.
Empirical Mode Modeling: A data-driven approach to recover and forecast nonlinear dynamics from noisy data
Park, Joseph, Pao, Gerald M, Stabenau, Erik, Sugihara, George, Lorimer, Thomas
Data-driven, model-free analytics are natural choices for discovery and forecasting of complex, nonlinear systems. Methods that operate in the system state-space require either an explicit multidimensional state-space, or, one approximated from available observations. Since observational data are frequently sampled with noise, it is possible that noise can corrupt the state-space representation degrading analytical performance. Here, we evaluate the synthesis of empirical mode decomposition with empirical dynamic modeling, which we term empirical mode modeling, to increase the information content of state-space representations in the presence of noise. Evaluation of a mathematical, and, an ecologically important geophysical application across three different state-space representations suggests that empirical mode modeling may be a useful technique for data-driven, model-free, state-space analysis in the presence of noise.
Low-Rank Sinkhorn Factorization
Scetbon, Meyer, Cuturi, Marco, Peyré, Gabriel
Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al., 2018, who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of low rank couplings as a product of \textit{sub-coupling} factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.
Signal Processing on the Permutahedron: Tight Spectral Frames for Ranked Data Analysis
Chen, Ellen, DeJong, Jennifer, Halverson, Tom, Shuman, David I
Ranked data sets, where m judges/voters specify a preference ranking of n objects/candidates, are increasingly prevalent in contexts such as political elections, computer vision, recommender systems, and bioinformatics. The vote counts for each ranking can be viewed as an n! data vector lying on the permutahedron, which is a Cayley graph of the symmetric group with vertices labeled by permutations and an edge when two permutations differ by an adjacent transposition. Leveraging combinatorial representation theory and recent progress in signal processing on graphs, we investigate a novel, scalable transform method to interpret and exploit structure in ranked data. We represent data on the permutahedron using an overcomplete dictionary of atoms, each of which captures both smoothness information about the data (typically the focus of spectral graph decomposition methods in graph signal processing) and structural information about the data (typically the focus of symmetry decomposition methods from representation theory). These atoms have a more naturally interpretable structure than any known basis for signals on the permutahedron, and they form a Parseval frame, ensuring beneficial numerical properties such as energy preservation. We develop specialized algorithms and open software that take advantage of the symmetry and structure of the permutahedron to improve the scalability of the proposed method, making it more applicable to the high-dimensional ranked data found in applications.
If We Draw Graphs Like This, We Can Change Computers Forever
Jacob Holm was flipping through proofs from an October 2019 research paper he and colleague Eva Rotenberg--an associate professor in the department of applied mathematics and computer science at the Technical University of Denmark--had published online, when he discovered their findings had unwittingly given away a solution to a centuries-old graph problem. Holm, an assistant professor of computer science at the University of Copenhagen, was relieved no one had caught the solution first. "It was a real'Eureka!' moment," he says. Holm and Rotenberg were trying to find a shortcut for determining whether a graph is "planar"--that is, if it could be drawn flat on a surface without any of its lines crossing each other (flat drawings of a graph are also called "embeddings"). "Putting it very bluntly, we formally quantified why something is a terrible drawing." To mathematicians, a graph often looks different than what most of us are taught in school.
Contrastive learning of strong-mixing continuous-time stochastic processes
Liu, Bingbin, Ravikumar, Pradeep, Risteski, Andrej
One of the paradigms of learning from unlabeled data that has seen a lot of recent work in various application domains is "self-supervised learning". These methods supervise the training process with information inherent to the data without requiring human annotations, and have been applied across computer vision, natural language processing, reinforcement learning and scientific domains. Despite the popularity, they are still not very well understood--both on the theoretical and empirical front--often requiring extensive trial and error to find the right pairing of architecture and learning method. In particular, it is often hard to pin down what exactly these methods are trying to learn, and it is even harder to determine what is their statistical and algorithmic complexity. The specific family of self-supervised approaches we focus on in this work is contrastive learning, which constructs different types of tuples by utilizing certain structures in the data and trains the model to identify the types. For an example in vision, Chen et al. (2020) apply two random augmentations (e.g.
"Number Theory," by Rosanna Warren
The four-and-a-half-foot black-backed rat snake swayed up and across the kitchen screen door, seeking a way in. So we know we're living with a patient You sit taut in your chair, whispering, as you probe the gaps between prime numbers. The opening through which your thought will glide suddenly into a lit space and be at home. In a shaky house, where wasps gnaw the walls.
A Stein Goodness of fit Test for Exponential Random Graph Models
We propose and analyse a novel nonparametric goodness of fit testing procedure for exchangeable exponential random graph models (ERGMs) when a single network realisation is observed. The test determines how likely it is that the observation is generated from a target unnormalised ERGM density. Our test statistics are derived from a kernel Stein discrepancy, a divergence constructed via Steins method using functions in a reproducing kernel Hilbert space, combined with a discrete Stein operator for ERGMs. The test is a Monte Carlo test based on simulated networks from the target ERGM. We show theoretical properties for the testing procedure for a class of ERGMs. Simulation studies and real network applications are presented.