Goto

Collaborating Authors

 Mathematical & Statistical Methods


Exploring Linear Algebra - Part 1: Estimating Route Costs

@machinelearnbot

This is my first entry in a series of articles with creative applications of linear algebra to problems. This one was inspired by an Uber ride. So, imagine you are Google Maps, and your client wants to know the best path to take from point A to point B. If you have the city's map, it's easy, right? Just wearily apply Dijikstra's algorithm to find the shortest path, and that's your answer. If you've taken enough Uber rides, you know that sometimes the shortest path also happens to be the one under worst maintenance, or maybe it's the most jammed up.


Learning Graphs from Data: A Signal Representation Perspective

arXiv.org Machine Learning

The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data.


A Geometric Approach for Real-time Monitoring of Dynamic Large Scale Graphs: AS-level graphs illustrated

arXiv.org Machine Learning

The monitoring of large dynamic networks is a major chal- lenge for a wide range of application. The complexity stems from properties of the underlying graphs, in which slight local changes can lead to sizable variations of global prop- erties, e.g., under certain conditions, a single link cut that may be overlooked during monitoring can result in splitting the graph into two disconnected components. Moreover, it is often difficult to determine whether a change will propagate globally or remain local. Traditional graph theory measure such as the centrality or the assortativity of the graph are not satisfying to characterize global properties of the graph. In this paper, we tackle the problem of real-time monitoring of dynamic large scale graphs by developing a geometric approach that leverages notions of geometric curvature and recent development in graph embeddings using Ollivier-Ricci curvature [47]. We illustrate the use of our method by consid- ering the practical case of monitoring dynamic variations of global Internet using topology changes information provided by combining several BGP feeds. In particular, we use our method to detect major events and changes via the geometry of the embedding of the graph.


Distributed Stochastic Gradient Tracking Methods

arXiv.org Machine Learning

In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size $n$, which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.


Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications

arXiv.org Machine Learning

The von Neumann graph entropy (VNGE) facilitates the measure of information divergence and distance between graphs in a graph sequence and has successfully been applied to various network learning tasks. Albeit its effectiveness, it is computationally demanding by requiring the full eigenspectrum of the graph Laplacian matrix. In this paper, we propose a Fast Incremental von Neumann Graph EntRopy (FINGER) framework, which approaches VNGE with a performance guarantee. FINGER reduces the cubic complexity of VNGE to linear complexity in the number of nodes and edges, and thus enables online computation based on incremental graph changes. We also show asymptotic consistency of FINGER to the exact VNGE, and derive its approximation error bounds. Based on FINGER, we propose ultra-efficient algorithms for computing Jensen-Shannon distance between graphs. Our experimental results on different random graph models demonstrate the computational efficiency and the asymptotic consistency of FINGER. In addition, we also apply FINGER to two real-world applications and one synthesized dataset, and corroborate its superior performance over seven baseline graph similarity methods.


Stable Geodesic Update on Hyperbolic Space and its Application to Poincare Embeddings

arXiv.org Machine Learning

A hyperbolic space has been shown to be more capable of modeling complex networks than a Euclidean space. This paper proposes an explicit update rule along geodesics in a hyperbolic space. The convergence of our algorithm is theoretically guaranteed, and the convergence rate is better than the conventional Euclidean gradient descent algorithm. Moreover, our algorithm avoids the "bias" problem of existing methods using the Riemannian gradient. Experimental results demonstrate the good performance of our algorithm in the \Poincare embeddings of knowledge base data.


Predictive Local Smoothness for Stochastic Gradient Methods

arXiv.org Machine Learning

Stochastic gradient methods are dominant in nonconvex optimization especially for deep models but have low asymptotical convergence due to the fixed smoothness. To address this problem, we propose a simple yet effective method for improving stochastic gradient methods named predictive local smoothness (PLS). First, we create a convergence condition to build a learning rate which varies adaptively with local smoothness. Second, the local smoothness can be predicted by the latest gradients. Third, we use the adaptive learning rate to update the stochastic gradients for exploring linear convergence rates. By applying the PLS method, we implement new variants of three popular algorithms: PLS-stochastic gradient descent (PLS-SGD), PLS-accelerated SGD (PLS-AccSGD), and PLS-AMSGrad. Moreover, we provide much simpler proofs to ensure their linear convergence. Empirical results show that the variants have better performance gains than the popular algorithms, such as, faster convergence and alleviating explosion and vanish of gradients.


Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time

arXiv.org Artificial Intelligence

In this paper, we propose a new adaptive stochastic gradient Langevin dynamics (ASGLD) algorithmic framework and its two specialized versions, namely adaptive stochastic gradient (ASG) and adaptive gradient Langevin dynamics(AGLD), for non-convex optimization problems. All proposed algorithms can escape from saddle points with at most $O(\log d)$ iterations, which is nearly dimension-free. Further, we show that ASGLD and ASG converge to a local minimum with at most $O(\log d/\epsilon^4)$ iterations. Also, ASGLD with full gradients or ASGLD with a slowly linearly increasing batch size converge to a local minimum with iterations bounded by $O(\log d/\epsilon^2)$, which outperforms existing first-order methods.


Cancer Genomics Neural Networks vs k-NN Classifiers

@machinelearnbot

Get your team access to Udemy's top 2,500 courses anytime, anywhere. Cancer Genomics Neural Networks vs k-NN Classifiers: Machine Learning for Python Hackers is a crash course in Data Science and Cancer Genomics for anyone interested in cancer research. The course starts out with loading up a cancer dataset to split train and test. This course is unique in Data Science in that it uses the mglearn library for better visualization and is dedicated to providing details as such so the student can follow along with no ambiguity.


Approximate Newton-based statistical inference using only stochastic gradients

arXiv.org Machine Learning

We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in $M$-estimation, both for unregularized convex learning problems and high-dimensional LASSO regression, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks.