Collaborating Authors


Markov Modeling of Time-Series Data using Symbolic Analysis Machine Learning

Markov models are often used to capture the temporal patterns of sequential data for statistical learning applications. While the Hidden Markov modeling-based learning mechanisms are well studied in literature, we analyze a symbolic-dynamics inspired approach. Under this umbrella, Markov modeling of time-series data consists of two major steps -- discretization of continuous attributes followed by estimating the size of temporal memory of the discretized sequence. These two steps are critical for the accurate and concise representation of time-series data in the discrete space. Discretization governs the information content of the resultant discretized sequence. On the other hand, memory estimation of the symbolic sequence helps to extract the predictive patterns in the discretized data. Clearly, the effectiveness of signal representation as a discrete Markov process depends on both these steps. In this paper, we will review the different techniques for discretization and memory estimation for discrete stochastic processes. In particular, we will focus on the individual problems of discretization and order estimation for discrete stochastic process. We will present some results from literature on partitioning from dynamical systems theory and order estimation using concepts of information theory and statistical learning. The paper also presents some related problem formulations which will be useful for machine learning and statistical learning application using the symbolic framework of data analysis. We present some results of statistical analysis of a complex thermoacoustic instability phenomenon during lean-premixed combustion in jet-turbine engines using the proposed Markov modeling method.

The Efficient Shrinkage Path: Maximum Likelihood of Minimum MSE Risk Machine Learning

When linear models are fit to ill-conditioned or confounded narrow-data, TRACE plots are useful in demonstrating and justifying deliberately biased estimation. This makes TRACE diagnostics powerful "visual" displays. If advanced students of regression are trained in interpretation of Trace plots, they could help admininstrators capable of basic statistical thinking avoid misinterpretations of questionable regression coefficient estimates. All five types of ridge TRACE plots for a wide variety of ridge paths can be explored using R-functions. For example, the RXshrink aug.lars() function generates TRACE s for Least-Angle, Lasso and Forward Stagewise methods (Efron, Hastie, Johnstone and Tibshirani 2004; Hastie and

Solving and Learning Nonlinear PDEs with Gaussian Processes Machine Learning

We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs, (2) has guaranteed convergence with a path to compute error bounds in the PDE setting, and (3) inherits the state-of-the-art computational complexity of linear solvers for dense kernel matrices. The main idea of our method is to approximate the solution of a given PDE with a MAP estimator of a Gaussian process given the observation of the PDE at a finite number of collocation points. Although this optimization problem is infinite-dimensional, it can be reduced to a finite-dimensional one by introducing additional variables corresponding to the values of the derivatives of the solution at collocation points; this generalizes the representer theorem arising in Gaussian process regression. The reduced optimization problem has a quadratic loss and nonlinear constraints, and it is in turn solved with a variant of the Gauss-Newton method. The resulting algorithm (a) can be interpreted as solving successive linearizations of the nonlinear PDE, and (b) is found in practice to converge in a small number (two to ten) of iterations in experiments conducted on a range of PDEs. For IPs, while the traditional approach has been to iterate between the identifications of parameters in the PDE and the numerical approximation of its solution, our algorithm tackles both simultaneously. Experiments on nonlinear elliptic PDEs, Burgers' equation, a regularized Eikonal equation, and an IP for permeability identification in Darcy flow illustrate the efficacy and scope of our framework.

Bayesian imaging using Plug & Play priors: when Langevin meets Tweedie Machine Learning

Since the seminal work of Venkatakrishnan et al. (2013), Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. The PnP algorithms proposed in the literature mainly differ in the iterative schemes they use for optimisation or for sampling. In the case of optimisation schemes, some recent works guarantee the convergence to a fixed point, albeit not necessarily a MAP estimate. In the case of sampling schemes, to the best of our knowledge, there is no known proof of convergence. There also remain important open questions regarding whether the underlying Bayesian models and estimators are well defined, well-posed, and have the basic regularity properties required to support these numerical schemes. To address these limitations, this paper develops theory, methods, and provably convergent algorithms for performing Bayesian inference with PnP priors. We introduce two algorithms: 1) PnP-ULA (Unadjusted Langevin Algorithm) for Monte Carlo sampling and MMSE inference; and 2) PnP-SGD (Stochastic Gradient Descent) for MAP inference. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for these two algorithms under realistic assumptions on the denoising operators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well-posed. The proposed algorithms are demonstrated on several canonical problems such as image deblurring, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and quantification.

Inductive Inference in Supervised Classification Machine Learning

Inductive inference in supervised classification context constitutes to methods and approaches to assign some objects or items into different predefined classes using a formal rule that is derived from training data and possibly some additional auxiliary information. The optimality of such an assignment varies under different conditions due to intrinsic attributes of the objects being considered for such a task. One of these cases is when all the objects' features are discrete variables with a priori known categories. As another example, one can consider a modification of this case with a priori unknown categories. These two cases are the main focus of this thesis and based on Bayesian inductive theories, de Finetti type exchangeability is a suitable assumption that facilitates the derivation of classifiers in the former scenario. On the contrary, this type of exchangeability is not applicable in the latter case, instead, it is possible to utilise the partition exchangeability due to John Kingman. These two types of exchangeabilities are discussed and furthermore here I investigate inductive supervised classifiers based on both types of exchangeabilities. I further demonstrate that the classifiers based on de Finetti type exchangeability can optimally handle test items independently of each other in the presence of infinite amounts of training data while on the other hand, classifiers based on partition exchangeability still continue to benefit from joint labelling of all the test items. Additionally, it is shown that the inductive learning process for the simultaneous classifier saturates when the amount of test data tends to infinity.

Decision Theoretic Bootstrapping Machine Learning

The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite; they are imperfectly known (and possibly distinct) when the data is finite (and possibly corrupted) and this uncertainty must be taken into account for robust Uncertainty Quantification (UQ). We present a general decision-theoretic bootstrapping solution to this problem: (1) partition the available data into a training subset and a UQ subset (2) take $m$ subsampled subsets of the training set and train $m$ models (3) partition the UQ set into $n$ sorted subsets and take a random fraction of them to define $n$ corresponding empirical distributions $\mu_{j}$ (4) consider the adversarial game where Player I selects a model $i\in\left\{ 1,\ldots,m\right\} $, Player II selects the UQ distribution $\mu_{j}$ and Player I receives a loss defined by evaluating the model $i$ against data points sampled from $\mu_{j}$ (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to distributional shift in both the distribution of training data and that of the testing data (2) conditional probability distributions on the output space forming aleatory representations of the uncertainty on the output as a function of the input variable.

Problem-fluent models for complex decision-making in autonomous materials research Machine Learning

This figure, reproduced in figure 1A is known by practitioners to provide a qualitative understanding of the overlapping domains of the respective methodologies, implying a continuum of models that offer various strengths and weaknesses for addressing specify types of materials modeling problems. While open to some interpretation, figures such as these are meant to illustrate a broad range of phenomena that can be captured, with each method achieving its own balance between ab initio calculation and empirical modeling. Indeed, our early work modeling of more complex phenomena within lattice-based kinetic Monte Carlo (KMC) atomistic simulations (figure 1B) is one example of attaining such a balance through empirically designed liquid local neighborhoods calibrated alongside analytical continuum models.

Hierarchical Bayesian Model for the Transfer of Knowledge on Spatial Concepts based on Multimodal Information Artificial Intelligence

This paper proposes a hierarchical Bayesian model based on spatial concepts that enables a robot to transfer the knowledge of places from experienced environments to a new environment. The transfer of knowledge based on spatial concepts is modeled as the calculation process of the posterior distribution based on the observations obtained in each environment with the parameters of spatial concepts generalized to environments as prior knowledge. We conducted experiments to evaluate the generalization performance of spatial knowledge for general places such as kitchens and the adaptive performance of spatial knowledge for unique places such as `Emma's room' in a new environment. In the experiments, the accuracies of the proposed method and conventional methods were compared in the prediction task of location names from an image and a position, and the prediction task of positions from a location name. The experimental results demonstrated that the proposed method has a higher prediction accuracy of location names and positions than the conventional method owing to the transfer of knowledge.

A Variational Inference Framework for Inverse Problems Machine Learning

We present a framework for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy performances and reduced model fitting times, when compared with standard Markov chain Monte Carlo methods. The message passing and factor graph fragment approach to variational Bayes we describe facilitates streamlined implementation of approximate inference algorithms and forms the basis to software development. Such approach allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Albeit our analysis is circumscribed to one- and two-dimensional response variables, we lay down an infrastructure where streamlining algorithmic steps based on nullifying weak interactions between variables are extendible to inverse problems in higher dimensions. Image processing applications motivated by biomedical and archaeological problems are included as illustrations.

Approximate Bayesian inference and forecasting in huge-dimensional multi-country VARs Machine Learning

The Panel Vector Autoregressive (PVAR) model is a popular tool for macroeconomic forecasting and structural analysis in multi-country applications since it allows for spillovers between countries in a very flexible fashion. However, this flexibility means that the number of parameters to be estimated can be enormous leading to over-parameterization concerns. Bayesian global-local shrinkage priors, such as the Horseshoe prior used in this paper, can overcome these concerns, but they require the use of Markov Chain Monte Carlo (MCMC) methods rendering them computationally infeasible in high dimensions. In this paper, we develop computationally efficient Bayesian methods for estimating PVARs using an integrated rotated Gaussian approximation (IRGA). This exploits the fact that whereas own country information is often important in PVARs, information on other countries is often unimportant. Using an IRGA, we split the the posterior into two parts: one involving own country coefficients, the other involving other country coefficients. Fast methods such as approximate message passing or variational Bayes can be used on the latter and, conditional on these, the former are estimated with precision using MCMC methods. In a forecasting exercise involving PVARs with up to $18$ variables for each of $38$ countries, we demonstrate that our methods produce good forecasts quickly.