Goto

Collaborating Authors

 Mathematical & Statistical Methods


An Information-Geometric Distance on the Space of Tasks

arXiv.org Machine Learning

This paper computes a distance between tasks modeled as joint distributions on data and labels. We develop a stochastic process that transports the marginal on the data of the source task to that of the target task, and simultaneously updates the weights of a classifier initialized on the source task to track this evolving data distribution. The distance between two tasks is defined to be the shortest path on the Riemannian manifold of the conditional distribution of labels given data as the weights evolve. We derive connections of this distance with Rademacher complexity-based generalization bounds; distance between tasks computed using our method can be interpreted as the trajectory in weight space that keeps the generalization gap constant as the task distribution changes from the source to the target. Experiments on image classification datasets show that this task distance helps predict the performance of transfer learning: fine-tuning techniques have an easier time transferring to tasks that are close to each other under our distance.


The Mathematical Foundations of Manifold Learning

arXiv.org Artificial Intelligence

This is an edited version of my undergraduate thesis, submitted to the Harvard Mathematics Department in May 2020. It differs from the original thesis in one major respect, namely that this version omits the proofs of a number of theorems that are readily-available in other expositions. Whereas the original version reproduced these proofs in full, this version simply contains references to these proofs in other works. This thesis is built upon an extensive body of prior work in learning theory, graph theory, differential geometry, and manifold learning. In particular, I would like to thank Professors Lorenzo Rosasco and Tomaso Poggio for their lectures on statistical learning theory, Professor Daniel Spielman for his notes on spectral graph theory, Professor Yaiza Canzani for her notes on analysis on manifolds, and Professor Mikhail Belkin for his work on manifold learning. Finally, I wish to thank those people without whom I could never have written this thesis: my family, friends, and wonderful advisor Professor Arjun Manrai. Unlike the manifolds discussed herein, their support was truly boundless. I hope you enjoy and learn something from this thesis! If you have comments, corrections, or would like to contact me for anything else, feel free to email me.


Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs

arXiv.org Machine Learning

Collections of probability distributions arise in a variety of statistical applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions are represented by histograms over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of histograms over such general domains. To this end, we introduce the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, which allows us to reduce the histogram collection comparison problem to the comparison of means in a high-dimensional Euclidean space. We develop a hypothesis testing procedure based on conducting t-tests on each dimension of this embedding, then combining the resulting p-values using recently proposed p-value combination techniques. Our numerical experiments in a variety of data settings show that the resulting tests are powerful and the p-values are well-calibrated. Example applications to user activity patterns, spatial data, and brain connectomics are provided.


Probabilistic learning on manifolds constrained by nonlinear partial differential equations for small datasets

arXiv.org Machine Learning

A novel extension of the Probabilistic Learning on Manifolds (PLoM) is presented. It makes it possible to synthesize solutions to a wide range of nonlinear stochastic boundary value problems described by partial differential equations (PDEs) for which a stochastic computational model (SCM) is available and depends on a vector-valued random control parameter. The cost of a single numerical evaluation of this SCM is assumed to be such that only a limited number of points can be computed for constructing the training dataset (small data). Each point of the training dataset is made up realizations from a vector-valued stochastic process (the stochastic solution) and the associated random control parameter on which it depends. The presented PLoM constrained by PDE allows for generating a large number of learned realizations of the stochastic process and its corresponding random control parameter. These learned realizations are generated so as to minimize the vector-valued random residual of the PDE in the mean-square sense. Appropriate novel methods are developed to solve this challenging problem. Three applications are presented. The first one is a simple uncertain nonlinear dynamical system with a nonstationary stochastic excitation. The second one concerns the 2D nonlinear unsteady Navier-Stokes equations for incompressible flows in which the Reynolds number is the random control parameter. The last one deals with the nonlinear dynamics of a 3D elastic structure with uncertainties. The results obtained make it possible to validate the PLoM constrained by stochastic PDE but also provide further validation of the PLoM without constraint.


Bootstrapping Neural Processes

arXiv.org Machine Learning

Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility. To this end, we propose the Boostrapping Neural Process (BNP), a novel extension of the NP family using the bootstrap. The bootstrap is a classical data-driven technique for estimating uncertainty, which allows BNP to learn the stochasticity in NPs without assuming a particular form. We demonstrate the efficacy of BNP on various types of data and its robustness in the presence of model-data mismatch.


Linear Algebra 101 -- Part 1

#artificialintelligence

I believe understanding fundamental concepts is crucial when it comes to learning something advanced. Because the fundamentals are the basis where you build your advanced knowledge on top of. If you put more things on top of the weak basis, it could break apart in the end, meaning you end up not fully understanding any of the materials you learned. Then you might need to go back again to learn the fundamentals before going back to learn the most exciting advanced materials which could be time consuming. Linear Algebra is one of the fundamental topics that you should be very comfortable with.


Random Geometric Graphs on Euclidean Balls

arXiv.org Machine Learning

We consider a latent space model for random graphs where a node $i$ is associated to a random latent point $X_i$ on the Euclidean unit ball. The probability that an edge exists between two nodes is determined by a ``link'' function, which corresponds to a dot product kernel. For a given class $\F$ of spherically symmetric distributions for $X_i$, we consider two estimation problems: latent norm recovery and latent Gram matrix estimation. We construct an estimator for the latent norms based on the degree of the nodes of an observed graph in the case of the model where the edge probability is given by $f(\langle X_i,X_j\rangle)=\mathbbm{1}_{\langle X_i,X_j\rangle\geq \tau}$, where $0<\tau<1$. We introduce an estimator for the Gram matrix based on the eigenvectors of observed graph and we establish Frobenius type guarantee for the error, provided that the link function is sufficiently regular in the Sobolev sense and that a spectral-gap-type condition holds. We prove that for certain link functions, the model considered here generates graphs with degree distribution that have tails with a power-law-type distribution, which can be seen as an advantage of the model presented here with respect to the classic Random Geometric Graph model on the Euclidean sphere. We illustrate our results with numerical experiments.


An Efficient Newton Method for Extreme Similarity Learning with Nonlinear Embeddings

arXiv.org Machine Learning

We study the problem of learning similarity by using nonlinear embedding models (e.g., neural networks) from all possible pairs. This problem is well-known for its difficulty of training with the extreme number of pairs. Existing optimization methods extended from stochastic gradient methods suffer from slow convergence and high complexity per pass of all possible pairs. Inspired by some recent works reporting that Newton methods are competitive for training certain types of neural networks, in this work, we novelly apply the Newton method for this problem. A prohibitive cost depending on the extreme number of pairs occurs if the Newton method is directly applied. We propose an efficient algorithm which successfully eliminates the cost. Our proposed algorithm can take advantage of second-order information and lower time complexity per pass of all possible pairs. Experiments conducted on large-scale data sets demonstrate that the proposed algorithm is more efficient than existing algorithms.


Linear Algebra 101 -- Part 9: Singular Value Decomposition (SVD)

#artificialintelligence

Singular Value Decomposition (SVD) is another type of decomposition. Unlike eigendecomposition where the matrix you want to decompose has to be a square matrix, SVD allows you to decompose a rectangular matrix (a matrix that has different numbers of rows and columns). This is often more useful in a real-life scenario since the rectangular matrix could represent a wide variety of data that's not a square matrix. First, let's look at the definition itself. As you can see, SVD decomposes the matrix into 3 different matrices.


A Perspective on Theoretical Computer Science in Latin America

Communications of the ACM

Theoretical computer science is everywhere, for TCS is concerned with the foundations of computing and computing is everywhere! In the last three decades, a vibrant Latin American TCS community has emerged: here, we describe and celebrate some of its many noteworthy achievements. Computer science became a distinct academic discipline in the 1950s and early 1960s. The first CS department in the U.S. was formed in 1962, and by the 1970s virtually every university in the U.S. had one. In contrast, by the late 1970s, just a handful of Latin American universities were actively conducting research in the area. Several CS departments were eventually established during the late 1980s. Often, theoreticians played a decisive role in the foundation of these departments. One key catalyst in articulating collaborations among the few but growing number of enthusiastic theoreticians who were active in the international academic arena was the foundation of regional conferences.