Mathematical & Statistical Methods
Kernel-based Graph Learning from Smooth Signals: A Functional Viewpoint
Pu, Xingyue, Chau, Siu Lun, Dong, Xiaowen, Sejdinovic, Dino
The problem of graph learning concerns the construction of an explicit topological structure revealing the relationship between nodes representing data entities, which plays an increasingly important role in the success of many graph-based representations and algorithms in the field of machine learning and graph signal processing. In this paper, we propose a novel graph learning framework that incorporates the node-side and observation-side information, and in particular the covariates that help to explain the dependency structures in graph signals. To this end, we consider graph signals as functions in the reproducing kernel Hilbert space associated with a Kronecker product kernel, and integrate functional learning with smoothness-promoting graph learning to learn a graph representing the relationship between nodes. The functional learning increases the robustness of graph learning against missing and incomplete information in the graph signals. In addition, we develop a novel graph-based regularisation method which, when combined with the Kronecker product kernel, enables our model to capture both the dependency explained by the graph and the dependency due to graph signals observed under different but related circumstances, e.g. different points in time. The latter means the graph signals are free from the i.i.d. assumptions required by the classical graph learning models. Experiments on both synthetic and real-world data show that our methods outperform the state-of-the-art models in learning a meaningful graph topology from graph signals, in particular under heavy noise, missing values, and multiple dependency.
Multi-kernel Passive Stochastic Gradient Algorithms
Krishnamurthy, Vikram, Yin, George
This paper develops a novel passive stochastic gradient algorithm. In passive stochastic approximation, the stochastic gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. Classical passive stochastic gradient algorithms use a kernel that approximates a Dirac delta to weigh the gradients based on how far they are evaluated from the desired point. In this paper we construct a multi-kernel passive stochastic gradient algorithm. The algorithm performs substantially better in high dimensional problems and incorporates variance reduction. We analyze the weak convergence of the multi-kernel algorithm and its rate of convergence. In numerical examples, we study the multi-kernel version of the LMS algorithm to compare the performance with the classical passive version.
An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite Nonconvex Optimization
Liu, Deyi, Nguyen, Lam M., Tran-Dinh, Quoc
In this note we propose a new variant of the hybrid variance-reduced proximal gradient method in [7] to solve a common stochastic composite nonconvex optimization problem under standard assumptions. We simply replace the independent unbiased estimator in our hybrid- SARAH estimator introduced in [7] by the stochastic gradient evaluated at the same sample, leading to the identical momentum-SARAH estimator introduced in [2]. This allows us to save one stochastic gradient per iteration compared to [7], and only requires two samples per iteration. Our algorithm is very simple and achieves optimal stochastic oracle complexity bound in terms of stochastic gradient evaluations (up to a constant factor). Our analysis is essentially inspired by [7], but we do not use two different step-sizes.
ReLU activated Multi-Layer Neural Networks trained with Mixed Integer Linear Programs
Neural Networks typically learn by adjusting weights via nonlinear optimization in a training phase. Often, variants of gradient descent are used. These techniques require some differentiability. Therefore, non-smooth but piecewise linear activation functions like ReLU or the Heaviside function raise the question if techniques of linear and mixed integer linear programming are also suited for network training. Learning to near optimality can be performed with Linear Programs (LP) of exponential size for certain network architectures, see [2].
The Chi-Squared Test Statistic is a Must For Every Data Scientist: A Case Study in Customer Churn
The chi-square statistic is a useful tool for understanding the relationship between two categorical variables. For the sake of example, let's say you work for a tech company that has rolled out a new product and you want to assess the relationship between this product and customer churn. In the age of data, tech or otherwise, many companies undergo to risk of taking evidence that is either anecdotal or perhaps a high level visualization to indicate certainty of a given relationship. The chi-square statistic gives us a way to quantify and assess the strength of a given pair of categorical variables. Let's explore chi-square from this lens of customer churn.
Fractal Gaussian Networks: A sparse random graph model based on Gaussian Multiplicative Chaos
Ghosh, Subhroshekhar, Balasubramanian, Krishnakumar, Yang, Xiaochuan
We propose a novel stochastic network model, called Fractal Gaussian Network (FGN), that embodies well-defined and analytically tractable fractal structures. Such fractal structures have been empirically observed in diverse applications. FGNs interpolate continuously between the popular purely random geometric graphs (a.k.a. the Poisson Boolean network), and random graphs with increasingly fractal behavior. In fact, they form a parametric family of sparse random geometric graphs that are parametrized by a fractality parameter $\nu$ which governs the strength of the fractal structure. FGNs are driven by the latent spatial geometry of Gaussian Multiplicative Chaos (GMC), a canonical model of fractality in its own right. We asymptotically characterize the expected number of edges and triangle in FGNs. We then examine the natural question of detecting the presence of fractality and the problem of parameter estimation based on observed network data, in addition to fundamental properties of the FGN as a random graph model. We also explore fractality in community structures by unveiling a natural stochastic block model in the setting of FGNs.
Curvature-Dependant Global Convergence Rates for Optimization on Manifolds of Bounded Geometry
We give curvature-dependant convergence rates for the optimization of weakly convex functions defined on a manifold of 1-bounded geometry via Riemannian gradient descent and via the dynamic trivialization algorithm. In order to do this, we give a tighter bound on the norm of the Hessian of the Riemannian exponential than the previously known. We compute these bounds explicitly for some manifolds commonly used in the optimization literature such as the special orthogonal group and the real Grassmannian. Along the way, we present self-contained proofs of fully general bounds on the norm of the differential of the exponential map and certain cosine inequalities on manifolds, which are commonly used in optimization on manifolds.
Introduction to Linear Algebra for Applied Machine Learning with Python
Linear algebra is to machine learning as flour to bakery: every machine learning model is based in linear algebra, as every cake is based in flour. It is not the only ingredient, of course. Machine learning models need vector calculus, probability, and optimization, as cakes need sugar, eggs, and butter. Applied machine learning, like bakery, is essentially about combining these mathematical ingredients in clever ways to create useful (tasty?) models. This document contains introductory level linear algebra notes for applied machine learning. It is meant as a reference rather than a comprehensive review. If you ever get confused by matrix multiplication, don't remember what was the $L_2$ norm, or the conditions for linear independence, this can serve as a quick reference. It also a good introduction for people that don't need a deep understanding of linear algebra, but still want to learn about the fundamentals to read about machine learning or to use pre-packaged machine learning ...
Primary MS in Machine Learning - Machine Learning - CMU - Carnegie Mellon University
Incoming students must have a strong background in computer science, including a solid understanding of complexity theory and good programming skills, as well as a good background in mathematics. Specifically, the first-year courses assume at least one year of college-level probability and statistics, as well as matrix algebra and multivariate calculus. For our introductory ML course, there's a self-assessment test [PDF] which will give you some idea about the background we expect students to have (for the MS you're looking at the "modest requirements"). Generally, you need to have some reasonable programming skills, with experience in Matlab/R/scipy-numpy especially helpful, and Java and Python being more useful than C, and a solid math background, especially in probability/statistics, linear algebra, and matrix and tensor calculus. There was significant variation in all of these scores, and they are only a small portion of applicants' qualifications.
Ultrahigh dimensional instrument detection using graph learning: an application to high dimensional GIS-census data for house pricing
Xu, Ning, Fisher, Timothy C. G., Hong, Jian
The exogeneity bias and instrument validation have always been critical topics in statistics, machine learning and biostatistics. In the era of big data, such issues typically come with dimensionality issue and, hence, require even more attention than ever. In this paper we ensemble two well-known tools from machine learning and biostatistics -- stable variable selection and random graph -- and apply them to estimating the house pricing mechanics and the follow-up socio-economic effect on the 2010 Sydney house data. The estimation is conducted on an over-200-gigabyte ultrahigh dimensional database consisting of local education data, GIS information, census data, house transaction and other socio-economic records. The technique ensemble carefully improves the variable selection sparisty, stability and robustness to high dimensionality, complicated causal structures and the consequent multicollinearity, which is ultimately helpful on the data-driven recovery of a sparse and intuitive causal structure. The new ensemble also reveals its efficiency and effectiveness on endogeneity detection, instrument validation, weak instruments pruning and selection of proper instruments. From the perspective of machine learning, the estimation result both aligns with and confirms the facts of Sydney house market, the classical economic theories and the previous findings of simultaneous equations modeling. Moreover, the estimation result is totally consistent with and supported by the classical econometric tool like two-stage least square regression and different instrument tests (the code can be found at https://github.com/isaac2math/solar_graph_learning).