Mathematical & Statistical Methods
Bayesian learning of forest and tree graphical models
In Bayesian learning of Gaussian graphical model structure, it is common to restrict attention to certain classes of graphs and approximate the posterior distribution by repeatedly moving from one graph to another, using MCMC or methods such as stochastic shotgun search (SSS). I give two corrected versions of an algorithm for non-decomposable graphs and discuss random graph distributions, in particular as prior distributions. The main topic of the thesis is Bayesian structure-learning with forests or trees. Restricting attention to these graphs can be justified using theorems on random graphs. I describe how to use the Chow$\unicode{x2013}$Liu algorithm and the Matrix Tree Theorem to find the MAP forest and certain quantities in the posterior distribution on trees. I give adapted versions of MCMC and SSS for approximating the posterior distribution for forests and trees, and systems for storing these graphs so that it is easy to choose moves to neighbouring graphs. Experiments show that SSS with trees does well when the true graph is a tree or sparse graph. SSS with trees or forests does better than SSS with decomposable graphs in certain cases. Graph priors improve detection of hubs but need large ranges of probabilities. MCMC on forests fails to mix well and MCMC on trees is slower than SSS. (For a longer abstract see the thesis.)
Estimation of Riemannian distances between covariance operators and Gaussian processes
In this work we study two Riemannian distances between infinite-dimensional positive definite Hilbert-Schmidt operators, namely affine-invariant Riemannian and Log-Hilbert-Schmidt distances, in the context of covariance operators associated with functional stochastic processes, in particular Gaussian processes. Our first main results show that both distances converge in the Hilbert-Schmidt norm. Using concentration results for Hilbert space-valued random variables, we then show that both distances can be consistently and efficiently estimated from (i) sample covariance operators, (ii) finite, normalized covariance matrices, and (iii) finite samples generated by the given processes, all with dimension-independent convergence. Our theoretical analysis exploits extensively the methodology of reproducing kernel Hilbert space (RKHS) covariance and cross-covariance operators. The theoretical formulation is illustrated with numerical experiments on covariance operators of Gaussian processes.
How Big Data Carried Graph Theory Into New Dimensions
The mathematical language for talking about connections, which usually depends on networks--vertices (dots) and edges (lines connecting them)--has been an invaluable way to model real-world phenomena since at least the 18th century. But a few decades ago, the emergence of giant data sets forced researchers to expand their toolboxes and, at the same time, gave them sprawling sandboxes in which to apply new mathematical insights. Since then, said Josh Grochow, a computer scientist at the University of Colorado, Boulder, there's been an exciting period of rapid growth as researchers have developed new kinds of network models that can find complex structures and signals in the noise of big data. Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research develop ments and trends in mathe matics and the physical and life sciences. Grochow is among a growing chorus of researchers who point out that when it comes to finding connections in big data, graph theory has its limits.
Distributionally Robust Learning
Chen, Ruidi, Paschalidis, Ioannis Ch.
This monograph develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. Beginning with fundamental properties of the Wasserstein metric and the DRO formulation, we explore duality to arrive at tractable formulations and develop finite-sample, as well as asymptotic, performance guarantees. We consider a series of learning problems, including (i) distributionally robust linear regression; (ii) distributionally robust regression with group structure in the predictors; (iii) distributionally robust multi-output regression and multiclass classification, (iv) optimal decision making that combines distributionally robust regression with nearest-neighbor estimation; (v) distributionally robust semi-supervised learning, and (vi) distributionally robust reinforcement learning. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining bounds on the prediction and estimation errors of the solution. Beyond theory, we include numerical experiments and case studies using synthetic and real data. The real data experiments are all associated with various health informatics problems, an application area which provided the initial impetus for this work.
Lossy Compression for Lossless Prediction
Dubois, Yann, Bloem-Reddy, Benjamin, Ullrich, Karen, Maddison, Chris J.
Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than $1000\times$ on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.
Revisit the Fundamental Theorem of Linear Algebra
This survey is meant to provide an introduction to the fundamental theorem of linear algebra and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to linear algebra. Specifically, we provide some details and proofs of some results from (Strang, 1993). We then describe the fundamental theorem of linear algebra from different views and find the properties and relationships behind the views. The fundamental theorem of linear algebra is essential in many fields, such as electrical engineering, computer science, machine learning, and deep learning.
Revisit the Fundamental Theorem of Linear Algebra
This survey is meant to provide an introduction to the fundamental theorem of linear algebra and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to linear algebra. Specifically, we provide some details and proofs of some results from (Strang, 1993). We then describe the fundamental theorem of linear algebra from different views and find the properties and relationships behind the views. The fundamental theorem of linear algebra is essential in many fields, such as electrical engineering, computer science, machine learning, and deep learning. This survey is primarily a summary of purpose, significance of important theories behind it. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in theory behind the fundamental theorem of linear algebra and rigorous analysis in order to seamlessly introduce its properties in four subspaces in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results and given the paucity of scope to present this discussion, e.g., the separated analysis of the (orthogonal) projection matrices. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields. Some excellent examples include (Rose, 1982; Strang, 2009; Trefethen and Bau III, 1997; Strang, 2019, 2021).
DuaLip: Solving extreme-scale linear programs for web applications
Building thriving internet or web products requires optimizing for multiple business metrics. In many applications, these business metrics can move in conflicting directions, and delicately balancing these tradeoffs can be vital for the success of a product. For example, emails and notifications are foundational components of many companies' growth strategies today. At LinkedIn, when deciding which emails and notifications to send, we focus on the engagement generated, while remaining mindful of the volume sent . Similarly, when providing edge recommendations, total edges formed, and the engagement on those edges can be at odds: optimizing for total connections might lead to a large number of edges that have no engagement, bringing no additional value to the members.
Risk Conditioned Neural Motion Planning
Huang, Xin, Feng, Meng, Jasour, Ashkan, Rosman, Guy, Williams, Brian
Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. While existing mathematical programming methods offer theoretical guarantees in the context of constrained Markov decision processes, they either lack scalability in solving larger problems or produce conservative plans. Recent advances in deep reinforcement learning improve scalability by learning policy networks as function approximators. In this paper, we propose an extension of soft actor critic model to estimate the execution risk of a plan through a risk critic and produce risk-bounded policies efficiently by adding an extra risk term in the loss function of the policy network. We define the execution risk in an accurate form, as opposed to approximating it through a summation of immediate risks at each time step that leads to conservative plans. Our proposed model is conditioned on a continuous spectrum of risk bounds, allowing the user to adjust the risk-averse level of the agent on the fly. Through a set of experiments, we show the advantage of our model in terms of both computational time and plan quality, compared to a state-of-the-art mathematical programming baseline, and validate its performance in more complicated scenarios, including nonlinear dynamics and larger state space.
Scheduling Aerial Vehicles in an Urban Air Mobility Scheme
Rigas, Emmanouil S., Kolios, Panayiotis, Ellinas, Georgios
Highly populated cities face several challenges, one of them being the intense traffic congestion. In recent years, the concept of Urban Air Mobility has been put forward by large companies and organizations as a way to address this problem, and this approach has been rapidly gaining ground. This disruptive technology involves aerial vehicles (AVs) for hire than can be utilized by customers to travel between locations within large cities. This concept has the potential to drastically decrease traffic congestion and reduce air pollution, since these vehicles typically use electric motors powered by batteries. This work studies the problem of scheduling the assignment of AVs to customers, having as a goal to maximize the serviced customers and minimize the energy consumption of the AVs by forcing them to fly at the lowest possible altitude. Initially, an Integer Linear Program (ILP) formulation is presented, that is solved offline and optimally, followed by a near-optimal algorithm, that solves the problem incrementally, one AV at a time, to address scalability issues, allowing scheduling in problems involving large numbers of locations, AVs, and customer requests.