Statistical Learning: Overviews

Bayesian Optimization with Unknown Search Space Machine Learning

Applying Bayesian optimization in problems wherein the search space is unknown is challenging. To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization. We devise a strategy to guarantee that in iterative expansions of the search space, our method can find a point whose function value within epsilon of the objective function maximum. Without the need to specify any parameters, our algorithm automatically triggers a minimal expansion required iteratively. We derive analytic expressions for when to trigger the expansion and by how much to expand. We also provide theoretical analysis to show that our method achieves epsilon-accuracy after a finite number of iterations. We demonstrate our method on both benchmark test functions and machine learning hyper-parameter tuning tasks and demonstrate that our method outperforms baselines.

Neural Density Estimation and Likelihood-free Inference Machine Learning

I consider two problems in machine learning and statistics: the problem of estimating the joint probability density of a collection of random variables, known as density estimation, and the problem of inferring model parameters when their likelihood is intractable, known as likelihood-free inference. The contribution of the thesis is a set of new methods for addressing these problems that are based on recent advances in neural networks and deep learning.

Hyperbolic Graph Neural Networks Machine Learning

Learning from graph-structured data is an important task in machine learning and artificial intelligence, for which Graph Neural Networks (GNNs) have shown great promise. Motivated by recent advances in geometric representation learning, we propose a novel GNN architecture for learning representations on Riemannian manifolds with differentiable exponential and logarithmic maps. We develop a scalable algorithm for modeling the structural properties of graphs, comparing Euclidean and hyperbolic geometry. In our experiments, we show that hyperbolic GNNs can lead to substantial improvements on various benchmark datasets.

Learning Fair and Interpretable Representations via Linear Orthogonalization Machine Learning

To reduce human error and prejudice, many high-stakes decisions have been turned over to machine algorithms. However, recent research suggests that this does not remove discrimination, and can perpetuate harmful stereotypes. While algorithms have been developed to improve fairness, they typically face at least one of three shortcomings: they are not interpretable, they lose significant accuracy compared to unbiased equivalents, or they are not transferable across models. To address these issues, we propose a geometric method that removes correlations between data and any number of protected variables. Further, we can control the strength of debi-asing through an adjustable parameter to address the tradeoff between model accuracy and fairness. The resulting features are interpretable and can be used with many popular models, such as linear regression, random forest and multilayer perceptrons. The resulting predictions are found to be more accurate and fair than several comparable fair AI algorithms across a variety of benchmark datasets. Our work shows that debiasing data is a simple and effective solution toward improving fairness.

A Survey on Knowledge Graph Embeddings with Literals: Which model links better Literal-ly? Artificial Intelligence

Knowledge Graphs (KGs) are composed of structured information about a particular domain in the form of entities and relations. In addition to the structured information KGs help in facilitating interconnectivity and interoperability between different resources represented in the Linked Data Cloud. KGs have been used in a variety of applications such as entity linking, question answering, recommender systems, etc. However, KG applications suffer from high computational and storage costs. Hence, there arises the necessity for a representation able to map the high dimensional KGs into low dimensional spaces, i.e., embedding space, preserving structural as well as relational information. This paper conducts a survey of KG embedding models which not only consider the structured information contained in the form of entities and relations in a KG but also the unstructured information represented as literals such as text, numerical values, images, etc. Along with a theoretical analysis and comparison of the methods proposed so far for generating KG embeddings with literals, an empirical evaluation of the different methods under identical settings has been performed for the general task of link prediction.

Harnessing the power of Topological Data Analysis to detect change points in time series Machine Learning

We introduce a novel geometry-oriented methodology, based on the emerging tools of topological data analysis, into the change point detection framework. The key rationale is that change points are likely to be associated with changes in geometry behind the data generating process. While the applications of topological data analysis to change point detection are potentially very broad, in this paper we primarily focus on integrating topological concepts with the existing nonparametric methods for change point detection. In particular, the proposed new geometry-oriented approach aims to enhance detection accuracy of distributional regime shift locations. Our simulation studies suggest that integration of topological data analysis with some existing algorithms for change point detection leads to consistently more accurate detection results. We illustrate our new methodology in application to the two closely related environmental time series datasets -ice phenology of the Lake Baikal and the North Atlantic Oscillation indices, in a research query for a possible association between their estimated regime shift locations.

Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector Machine Learning

Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.

Structured Low-Rank Algorithms: Theory, MR Applications, and Links to Machine Learning Machine Learning

In this survey, we provide a detailed review of recent advances in the recovery of continuous domain multidimensional signals from their few nonuniform (multichannel) measurements using structured low-rank matrix completion formulation. This framework is centered on the fundamental duality between the compactness (e.g., sparsity) of the continuous signal and the rank of a structured matrix, whose entries are functions of the signal. This property enables the reformulation of the signal recovery as a low-rank structured matrix completion, which comes with performance guarantees. We will also review fast algorithms that are comparable in complexity to current compressed sensing methods, which enables the application of the framework to large-scale magnetic resonance (MR) recovery problems. The remarkable flexibility of the formulation can be used to exploit signal properties that are difficult to capture by current sparse and low-rank optimization strategies. We demonstrate the utility of the framework in a wide range of MR imaging (MRI) applications, including highly accelerated imaging, calibration-free acquisition, MR artifact correction, and ungated dynamic MRI. The slow nature of signal acquisition in magnetic resonance imaging (MRI), where the image is formed from a sequence of Fourier samples, often restricts the achievable spatial and temporal resolution in multidimensional static and dynamic imaging applications. Discrete compressed sensing (CS) methods provided a major breakthrough to accelerate the magnetic resonance (MR) signal acquisition by reducing the sampling burden. As described in an introductory article in this special issue [1] these algorithms exploited the sparsity of the discrete signal in a transform domain to recover the images from a few measurements. In this paper, we review a continuous domain extension of CS using a structured low-rank (SLR) framework for the recovery of an image or a series of images from a few measurements using various compactness assumptions [2]-[22]. The general strategy of the SLR framework starts with defining a lifting operation to construct a structured matrix, whose entries are functions of the signal samples. The SLR algorithms exploit the dual relationships between the signal compactness properties (e.g. This dual relationship allows recovery of the signal from a few samples in the measurement domain as an SLR optimization problem. MJ and MM are with the University of Iowa, Iowa City, IA 52242 (emails:, JCY is with the Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea (email:

PREMA: Principled Tensor Data Recovery from Multiple Aggregated Views Machine Learning

Multidimensional data have become ubiquitous and are frequently involved in situations where the information is aggregated over multiple data atoms. The aggregation can be over time or other features, such as geographical location or group affiliation. We often have access to multiple aggregated views of the same data, each aggregated in one or more dimensions, especially when data are collected or measured by different agencies. However, data mining and machine learning models require detailed data for personalized analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. The goal of this paper is to reconstruct finer-scale data from multiple coarse views, aggregated over different (subsets of) dimensions. The proposed method, called PREMA, leverages low-rank tensor factorization tools to provide recovery guarantees under certain conditions. PREMA is flexible in the sense that it can perform disaggregation on data that have missing entries, i.e., partially observed. The proposed method considers challenging scenarios: i) the available views of the data are aggregated in two dimensions, i.e., double aggregation, and ii) the aggregation patterns are unknown. Experiments on real data from different domains, i.e., sales data from retail companies, crime counts, and weather observations, are presented to showcase the effectiveness of PREMA.

Characterization and Development of Average Silhouette Width Clustering Machine Learning

The purpose of this paper is to introduced a new clustering methodology. This paper is divided into three parts. In the first part we have developed the axiomatic theory for the average silhouette width (ASW) index. There are different ways to investigate the quality and characteristics of clustering methods such as validation indices using simulations and real data experiments, model-based theory, and non-model-based theory known as the axiomatic theory. In this work we have not only taken the empirical approach of validation of clustering results through simulations, but also focus on the development of the axiomatic theory. In the second part we have presented a novel clustering methodology based on the optimization of the ASW index. We have considered the problem of estimation of number of clusters and finding clustering against this number simultaneously. Two algorithms are proposed. The proposed algorithms are evaluated against several partitioning and hierarchical clustering methods. An intensive empirical comparison of the different distance metrics on the various clustering methods is conducted. In the third part we have considered two application domains\textemdash novel single cell RNA sequencing datasets and rainfall data to cluster weather stations.