Mathematical & Statistical Methods
A near-optimal stochastic gradient method for decentralized non-convex finite-sum optimization
Xin, Ran, Khan, Usman A., Kar, Soummya
This paper describes a $near$-$optimal$ stochastic first-order gradient method for decentralized finite-sum minimization of smooth non-convex functions. Specifically, we propose GT-SARAH that employs a local SARAH-type variance reduction and global gradient tracking to address the stochastic and decentralized nature of the problem. Considering a total number of $N$ cost functions, equally divided over a directed network of $n$ nodes, we show that GT-SARAH finds an $\epsilon$-accurate first-order stationary point in ${\mathcal{O}(N^{1/2}\epsilon^{-1})}$ gradient computations across all nodes, independent of the network topology, when ${n\leq\mathcal{O}(N^{1/2}(1-\lambda)^{3})}$, where ${(1-\lambda)}$ is the spectral gap of the network weight matrix. In this regime, GT-SARAH is thus, to the best our knowledge, the first decentralized method that achieves the algorithmic lower bound for this class of problems. Moreover, GT-SARAH achieves a $non$-$asymptotic$ $linear$ $speedup$, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the near-optimal algorithms for this problem class that process all data at a single node. We also establish the convergence rate of GT-SARAH in other regimes, in terms of the relative sizes of the number of nodes $n$, total number of functions $N$, and the network spectral gap $(1-\lambda)$. Over infinite time horizon, we establish the almost sure and mean-squared convergence of GT-SARAH to a first-order stationary point.
A Review on Modern Computational Optimal Transport Methods with Applications in Biomedical Research
Zhang, Jingyi, Zhong, Wenxuan, Ma, Ping
Optimal transport has been one of the most exciting subjects in mathematics, starting from the 18th century. As a powerful tool to transport between two probability measures, optimal transport methods have been reinvigorated nowadays in a remarkable proliferation of modern data science applications. To meet the big data challenges, various computational tools have been developed in the recent decade to accelerate the computation for optimal transport methods. In this review, we present some cutting-edge computational optimal transport methods with a focus on the regularization-based methods and the projection-based methods. We discuss their real-world applications in biomedical research.
SeqROCTM: A Matlab toolbox for the analysis of Sequence of Random Objects driven by Context Tree Models
Duarte, Aline, Hernández, Noslen
In several research problems we face probabilistic sequences of inputs (e.g., sequence of stimuli) from which an agent generates a corresponding sequence of responses and it is of interest to model/discover some kind of relation between them. To model such relation in the context of statistical learning in neuroscience, a new class of stochastic process have been introduced [5], namely sequences of random objects driven by context tree models. In this paper we introduce a freely available Matlab toolbox (SeqROCTM) that implements three model selection methods to make inference about the parameters of this kind of stochastic process.
Amazon.com: Introduction to Algorithms, third edition eBook: Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford: Kindle Store
Introduction to Algorithms, the'bible' of the field, is a comprehensive textbook covering the full spectrum of modern algorithms: from the fastest algorithms and data structures to polynomial-time algorithms for seemingly intractable problems, from classical algorithms in graph theory to special algorithms for string matching, computational geometry, and number theory. The revised third edition notably adds a chapter on van Emde Boas trees, one of the most useful data structures, and on multithreaded algorithms, a topic of increasing importance.
Categorical Stochastic Processes and Likelihood
In this work we take a Category Theoretic perspective on the relationship between probabilistic modeling and function approximation. We begin by defining two extensions of function composition to stochastic process subordination: one based on the co-Kleisli category under the comonad (Omega x -) and one based on the parameterization of a category with a Lawvere theory. We show how these extensions relate to the category Stoch and other Markov Categories. Next, we apply the Para construction to extend stochastic processes to parameterized statistical models and we define a way to compose the likelihood functions of these models. We conclude with a demonstration of how the Maximum Likelihood Estimation procedure defines an identity-on-objects functor from the category of statistical models to the category of Learners. Code to accompany this paper can be found at https://github.com/dshieble/Categorical_Stochastic_Processes_and_Likelihood
Transfer learning for nonlinear dynamics and its application to fluid turbulence
Inubushi, Masanobu, Goto, Susumu
We introduce transfer learning for nonlinear dynamics, which enables efficient predictions of chaotic dynamics by utilizing a small amount of data. For the Lorenz chaos, by optimizing the transfer rate, we accomplish more accurate inference than the conventional method by an order of magnitude. Moreover, a surprisingly small amount of learning is enough to infer the energy dissipation rate of the Navier-Stokes turbulence because we can, thanks to the small-scale universality of turbulence, transfer a large amount of the knowledge learned from turbulence data at lower Reynolds number.
Data Science complete guide on Linear Algebra - DeepLearning
Then, this course is for you. The Common mistake by a data scientist is Applying the tools without the intuition of how it works and behaves. Having the solid foundation of mathematics will help you to understand how each algorithms work, its limitations and its underlying assumptions. It always pays to know the machinery under the hood, rather than being a guy who is just behind the wheel with no knowledge about the car. Linear Algebra is one of the area where everyone agrees to be a starting point in learning curve of Machine Learning, Data Science and Artificial intelligence.
Tensor Clustering with Planted Structures: Statistical Optimality and Computational Limits
This paper studies the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theory for testing whether a cluster exists (detection) and identifying the support of cluster (recovery). Specifically, we identify the sharp boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible. We also develop the tight computational thresholds: when the signal-to-noise ratio is below these thresholds, we prove that polynomial-time algorithms cannot solve these problems under the computational hardness conjectures of hypergraphic planted clique (HPC) detection and hypergraphic planted dense subgraph (HPDS) recovery. We also propose polynomial-time tensor algorithms that achieve reliable detection and recovery when the signal-to-noise ratio is above these thresholds. Both sparsity and tensor structures yield the computational barriers in high-order tensor clustering. The interplay between them results in significant differences between high-order tensor clustering and matrix clustering in literature in aspects of statistical and computational phase transition diagrams, algorithmic approaches, hardness conjecture, and proof techniques. To our best knowledge, we are the first to give a thorough characterization of the statistical and computational trade-off for such a double computational-barrier problem. Finally, we provide evidence for the computational hardness conjectures of HPC detection and HPDS recovery.
Learning to rank via combining representations
Helm, Hayden S., Basu, Amitabh, Athreya, Avanti, Park, Youngser, Vogelstein, Joshua T., Winding, Michael, Zlatic, Marta, Cardona, Albert, Bourke, Patrick, Larson, Jonathan, White, Chris, Priebe, Carey E.
Learning to rank - producing a ranked list of items specific to a query and with respect to a set of supervisory items - is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate - analytically, in simulation, and in real data examples - that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is of general interest, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic. Introduction Given a query, a collection of items, and supervisory information, producing a ranked list relative to the query is of general interest. In particular, learning to rank [1] and algorithms from related problem settings [2] have been used to improve popular search engines and recommender systems and, impressively, aid in the identification of human traffickers [3]. When learning to rank, for each training query researchers typically have access to (feature vector, ordinal) pairs that are used to learn an ordinal regressor via fitting a model under a set of probabilistic assumptions [4] or via deep learning techniques [5] that generalize to ranking items for never-beforeseen queries.
Testing correlation of unlabeled random graphs
Wu, Yihong, Xu, Jiaming, Yu, Sophie H.
We study the problem of detecting the edge correlation between two random graphs with $n$ unlabeled nodes. This is formalized as a hypothesis testing problem, where under the null hypothesis, the two graphs are independently generated; under the alternative, the two graphs are edge-correlated under some latent node correspondence, but have the same marginal distributions as the null. For both Gaussian-weighted complete graphs and dense Erd\H{o}s-R\'enyi graphs (with edge probability $n^{-o(1)}$), we determine the sharp threshold at which the optimal testing error probability exhibits a phase transition from zero to one as $n\to \infty$. For sparse Erd\H{o}s-R\'enyi graphs with edge probability $n^{-\Omega(1)}$, we determine the threshold within a constant factor. The proof of the impossibility results is an application of the conditional second-moment method, where we bound the truncated second moment of the likelihood ratio by carefully conditioning on the typical behavior of the intersection graph (consisting of edges in both observed graphs) and taking into account the cycle structure of the induced random permutation on the edges. Notably, in the sparse regime, this is accomplished by leveraging the pseudoforest structure of subcritical Erd\H{o}s-R\'enyi graphs and a careful enumeration of subpseudoforests that can be assembled from short orbits of the edge permutation.