Goto

Collaborating Authors

 Mathematical & Statistical Methods


Density Estimation with Distribution Element Trees

arXiv.org Machine Learning

The estimation of probability densities based on available data is a central task in many statistical applications. Especially in the case of large ensembles with many samples or high-dimensional sample spaces, computationally efficient methods are needed. We propose a new method that is based on a decomposition of the unknown distribution in terms of so-called distribution elements (DEs). These elements enable an adaptive and hierarchical discretization of the sample space with small or large elements in regions with smoothly or highly variable densities, respectively. The novel refinement strategy that we propose is based on statistical goodness-of-fit and pair-wise (as an approximation to mutual) independence tests that evaluate the local approximation of the distribution in terms of DEs. The capabilities of our new method are inspected based on several examples of different dimensionality and successfully compared with other state-of-the-art density estimators.


Key Machine Learning PreReq: Viewing Linear Algebra through the right lenses

@machinelearnbot

Think Sets and Functions, rather than manipulation of number arrays/rectangles: Linear Algebra is often introduced at the high-school level as computations one can perform on vectors and matrices - Matrix multiplication, Gauss elimination, Determinants, sometimes even Eigenvalue calculations, and I believe this introduction is quite detrimental to one's understanding of Linear Algebra. This computational approach continues on in many undergrad (and sometimes grad) level courses in Engineering and the Social Sciences. In fact, many Computer Scientists deal with Linear Algebra for decades of their professional life with this narrow (and in my opinion, harmful) view. I believe the right way to learn Linear Algebra is to view vectors as elements in a Set (Vector Space), and matrices as functions from one vector space to another. A vector of n numbers is an element in the vector space R n, and a m x n matrix is a function from R n to R m.


Spectrum Estimation from a Few Entries

arXiv.org Machine Learning

Singular values of a data in a matrix form provide insights on the structure of the data, the effective dimensionality, and the choice of hyper-parameters on higher-level data analysis tools. However, in many practical applications such as collaborative filtering and network analysis, we only get a partial observation. Under such scenarios, we consider the fundamental problem of recovering spectral properties of the underlying matrix from a sampling of its entries. We are particularly interested in directly recovering the spectrum, which is the set of singular values, and also in sample-efficient approaches for recovering a spectral sum function, which is an aggregate sum of the same function applied to each of the singular values. We propose first estimating the Schatten $k$-norms of a matrix, and then applying Chebyshev approximation to the spectral sum function or applying moment matching in Wasserstein distance to recover the singular values. The main technical challenge is in accurately estimating the Schatten norms from a sampling of a matrix. We introduce a novel unbiased estimator based on counting small structures in a graph and provide guarantees that match its empirical performance. Our theoretical analysis shows that Schatten norms can be recovered accurately from strictly smaller number of samples compared to what is needed to recover the underlying low-rank matrix. Numerical experiments suggest that we significantly improve upon a competing approach of using matrix completion methods.


Reconstructing undirected graphs from eigenspaces

arXiv.org Machine Learning

In this paper, we aim at recovering an undirected weighted graph of $N$ vertices from the knowledge of a perturbed version of the eigenspaces of its adjacency matrix $W$. For instance, this situation arises for stationary signals on graphs or for Markov chains observed at random times. Our approach is based on minimizing a cost function given by the Frobenius norm of the commutator $\mathsf{A} \mathsf{B}-\mathsf{B} \mathsf{A}$ between symmetric matrices $\mathsf{A}$ and $\mathsf{B}$. In the Erd\H{o}s-R\'enyi model with no self-loops, we show that identifiability (i.e., the ability to reconstruct $W$ from the knowledge of its eigenspaces) follows a sharp phase transition on the expected number of edges with threshold function $N\log N/2$. Given an estimation of the eigenspaces based on a $n$-sample, we provide support selection procedures from theoretical and practical point of views. In particular, when deleting an edge from the active support, our study unveils that our test statistic is the order of $\mathcal O(1/n)$ when we overestimate the true support and lower bounded by a positive constant when the estimated support is smaller than the true support. This feature leads to a powerful practical support estimation procedure. Simulated and real life numerical experiments assert our new methodology.



Joint Embedding of Graphs

arXiv.org Machine Learning

Feature extraction and dimension reduction for networks is critical in a wide variety of domains. Efficiently and accurately learning features for multiple graphs has important applications in statistical inference on graphs. We propose a method to jointly embed multiple undirected graphs. Given a set of graphs, the joint embedding method identifies a linear subspace spanned by rank one symmetric matrices and projects adjacency matrices of graphs into this subspace. The projection coefficients can be treated as features of the graphs. We also propose a random graph model which generalizes classical random graph model and can be used to model multiple graphs. We show through theory and numerical experiments that under the model, the joint embedding method produces estimates of parameters with small errors. Via simulation experiments, we demonstrate that the joint embedding method produces features which lead to state of the art performance in classifying graphs. Applying the joint embedding method to human brain graphs, we find it extract interpretable features that can be used to predict individual composite creativity index.


Introduction to Data Mining: Pang-Ning Tan, Michael Steinbach, Vipin Kumar: 9780136954712: Amazon.com: Books

@machinelearnbot

We used this book in a class which was my first academic introduction to data mining. The book's strengths are that it does a good job covering the field as it was around the 2008-2009 timeframe. Included are discussions of exploring data, classification, clustering, association analysis, cluster analysis, and anomaly detection. Additional bonus appendices cover some elements of linear algebra, dimensionality reduction, probability and statistics, regression analysis, and optimization, in case those concepts are fuzzy for the student. They're by no means thorough enough to learn the topic, merely to remind the reader of salient points they should remember.


Linear algebra cheat sheet for deep learning – Towards Data Science

#artificialintelligence

While participating in Jeremy Howard's excellent deep learning course I realized I was a little rusty on the prerequisites and my fuzziness was impacting my ability to understand concepts like backpropagation. I decided to put together a few wiki pages on these topics to improve my understanding. Here is a prettier version of my linear algebra page. In the context of deep learning, linear algebra is a mathematical toolbox that offers helpful techniques for manipulating groups of numbers simultaneously. It provides structures like vectors and matrices (spreadsheets) to hold these numbers and new rules for how to add, subtract, multiply, or divide them.


How and Why: Decorrelate Time Series

@machinelearnbot

When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions. A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not.


Stochastic Newton and Quasi-Newton Methods for Large Linear Least-squares Problems

arXiv.org Machine Learning

We describe stochastic Newton and stochastic quasi-Newton approaches to efficiently solve large linear least-squares problems where the very large data sets present a significant computational burden (e.g., the size may exceed computer memory or data are collected in real-time). In our proposed framework, stochasticity is introduced in two different frameworks as a means to overcome these computational limitations, and probability distributions that can exploit structure and/or sparsity are considered. Theoretical results on consistency of the approximations for both the stochastic Newton and the stochastic quasi-Newton methods are provided. The results show, in particular, that stochastic Newton iterates, in contrast to stochastic quasi-Newton iterates, may not converge to the desired least-squares solution. Numerical examples, including an example from extreme learning machines, demonstrate the potential applications of these methods.