Goto

Collaborating Authors

 Mathematical & Statistical Methods


Cox process representation and inference for stochastic reaction-diffusion processes

arXiv.org Machine Learning

Complex behaviour in many systems arises from the stochastic interactions of spatially distributed particles or agents. Stochastic reaction-diffusion processes are widely used to model such behaviour in disciplines ranging from biology to the social sciences, yet they are notoriously difficult to simulate and calibrate to observational data. Here we use ideas from statistical physics and machine learning to provide a solution to the inverse problem of learning a stochastic reaction-diffusion process from data. Our solution relies on a non-trivial connection between stochastic reaction-diffusion processes and spatio-temporal Cox processes, a well-studied class of models from computational statistics. This connection leads to an efficient and flexible algorithm for parameter inference and model selection. Our approach shows excellent accuracy on numeric and real data examples from systems biology and epidemiology. Our work provides both insights into spatio-temporal stochastic systems, and a practical solution to a long-standing problem in computational modelling.


A Master of Umbral Moonshine Toys With String Theory

WIRED

After the Eyjafjallajรถkull volcano erupted in Iceland in 2010, flight cancellations left Miranda Cheng stranded in Paris. While waiting for the ash to clear, Cheng, then a postdoctoral researcher at Harvard University studying string theory, got to thinking about a paper that had recently been posted online. Its three coauthors had pointed out a numerical coincidence connecting far-flung mathematical objects. "That smells like another moonshine," Cheng recalled thinking. "Could it be another moonshine?" She happened to have read a book about the "monstrous moonshine," a mathematical structure that unfolded out of a similar bit of numerology: In the late 1970s, the mathematician John McKay noticed that 196,884, the first important coefficient of an object called the j-function, was the sum of one and 196,883, the first two dimensions in which a giant collection of symmetries called the monster group could be represented.



Lecture Notes on Randomized Linear Algebra

arXiv.org Machine Learning

These are lecture notes that are based on the lectures from a class I taught on the topic of Randomized Linear Algebra (RLA) at UC Berkeley during the Fall 2013 semester.


Chi-Squared Test

#artificialintelligence

Before we build stats/machine learning models, it is a good practice to understand which predictors are significant and have an impact on the response variable. In this post we deal with a particular case when both your response and predictor are categorical variables. By the end of this you'd have gained an understanding of what predictive modelling is and what the significance and purpose of chi-square statistic is. We will go through a hypothetical case study to understand the math behind it. We will actually implement a chi-squared test in R and learn to interpret the results.


Selection of resources to learn Artificial Intelligence / Machine Learning / Statistical Inferenceโ€ฆ -- Artists and Machine Intelligence

#artificialintelligence

This is a very incomplete and subjective selection of resources to learn about the algorithms and maths of Artificial Intelligence (AI) / Machine Learning (ML) / Statistical Inference (SI) / Deep Learning (DL) / Reinforcement Learning (RL). It is aimed at beginners (those without Computer Science background and not knowing anything about these subjects) and hopes to take them to quite advanced levels (able to read and understand DL papers). It is not an exhaustive list and only contains some of the learning materials that I have personally completed so that I can include brief personal comments on them. It is also by no means the best path to follow (nowadays most MOOCs have full paths all the way from basic statistics and linear algebra to ML/DL). But this is the path I took and in a sense it's a partial documentation of my personal journey into DL (actually I bounced around all of these back and forth like crazy).


The Spectral Condition Number Plot for Regularization Parameter Determination

arXiv.org Machine Learning

Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators.


Limit theorems for eigenvectors of the normalized Laplacian for random graphs

arXiv.org Machine Learning

We prove a central limit theorem for the components of the eigenvectors corresponding to the $d$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and furthermore the mean and the covariance matrix of each row are functions of the associated vertex's block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.


Kernel Density Estimation for Dynamical Systems

arXiv.org Machine Learning

We study the density estimation problem with observations generated by certain dynamical systems that admit a unique underlying invariant Lebesgue density. Observations drawn from dynamical systems are not independent and moreover, usual mixing concepts may not be appropriate for measuring the dependence among these observations. By employing the $\mathcal{C}$-mixing concept to measure the dependence, we conduct statistical analysis on the consistency and convergence of the kernel density estimator. Our main results are as follows: First, we show that with properly chosen bandwidth, the kernel density estimator is universally consistent under $L_1$-norm; Second, we establish convergence rates for the estimator with respect to several classes of dynamical systems under $L_1$-norm. In the analysis, the density function $f$ is only assumed to be H\"{o}lder continuous which is a weak assumption in the literature of nonparametric density estimation and also more realistic in the dynamical system context. Last but not least, we prove that the same convergence rates of the estimator under $L_\infty$-norm and $L_1$-norm can be achieved when the density function is H\"{o}lder continuous, compactly supported and bounded. The bandwidth selection problem of the kernel density estimator for dynamical system is also discussed in our study via numerical simulations.


Local identifiability of $l_1$-minimization dictionary learning: a sufficient and almost necessary condition

arXiv.org Machine Learning

We study the theoretical properties of learning a dictionary from $N$ signals $\mathbf x_i\in \mathbb R^K$ for $i=1,...,N$ via $l_1$-minimization. We assume that $\mathbf x_i$'s are $i.i.d.$ random linear combinations of the $K$ columns from a complete (i.e., square and invertible) reference dictionary $\mathbf D_0 \in \mathbb R^{K\times K}$. Here, the random linear coefficients are generated from either the $s$-sparse Gaussian model or the Bernoulli-Gaussian model. First, for the population case, we establish a sufficient and almost necessary condition for the reference dictionary $\mathbf D_0$ to be locally identifiable, i.e., a local minimum of the expected $l_1$-norm objective function. Our condition covers both sparse and dense cases of the random linear coefficients and significantly improves the sufficient condition by Gribonval and Schnass (2010). In addition, we show that for a complete $\mu$-coherent reference dictionary, i.e., a dictionary with absolute pairwise column inner-product at most $\mu\in[0,1)$, local identifiability holds even when the random linear coefficient vector has up to $O(\mu^{-2})$ nonzeros on average. Moreover, our local identifiability results also translate to the finite sample case with high probability provided that the number of signals $N$ scales as $O(K\log K)$.